Minimize Data Transfers with the Intel® Xeon Phi™ Coprocessor

The application should process data “in-place” and minimize copying memory objects.

For example, transferring data over the PCI Express (PCIe) bus has the highest latency and the lowest bandwidth. Like with any other PCIe device, you should reduce this traffic to minimum.

Also when possible, use the CL_MEM_WRITE_ONLY and CL_MEM_READ_ONLY semantics with clCreateBuffer(). This also enables reducing data transfers across the PCIe bus.

While mapping a buffer to the host by use of clEnqueueMapBuffer, use the appropriate flags:

NOTE: Upon some period of low activity the CPU device might enter deep C-states, if aggressive power-saving features are enabled. It might happen during waiting for long DMA transfers to, or from coprocessor, which may result in significant degradation of data transfer bandwidth.

Refer to the "Shared Context for Multiple Intel® Xeon Phi™ Coprocessors" section for important tips on the avoiding implicit data copying by runtime.

If your tasks are independent, consider using out-of-order queue.

See Also

Shared Context for Multiple Intel® Xeon Phi™ Coprocessors