Experimenting with Coprocessor Performance Using Hardware Events

You can conduct experiments using variety of hardware events and associated efficiency metrics. For example, considering the kernels with respect to data read and write miss might help you to identify the potential for improving the prefetching or better data reuse by use of blocking techniques (tiling).

The event-driven analysis for the OpenCL application on the Intel Xeon Phi coprocessors is conceptually similar to the analysis for the regular native (or offload) application for the coprocessor. See the "Optimization and Performance Tuning for Intel® Xeon Phi™ Coprocessors, Part 2" web article for more information.

See Also

Threading: Achieving Parallelism Between Work-Groups
Utilizing Software Prefetching
Efficient Data Layout
Use Lower Math Precision
Use Branching Accurately
Developer Guide for Intel® SDK for OpenCL™ Applications
Optimization and Performance Tuning for Intel® Xeon Phi™ Coprocessors, Part 2
Intel® Xeon Phi™ Processor Targets