Analyzing OpenCL™ Applications

Once the analysis completes, you should see a view similar to the following. If you click on “Bottom-up” and then chose the grouping as selected below you will be ready to start tuning your application.

Also consider the following:

Inspect the same trace for top hotpots over all modules, assuming that you already filtered by mic_server process. To do so, switch to the Top-down Tree view:

Here you get the top-list of hotpots from all modules. In this example, most hotspots belong to dynamic code (notice that specific kernel names are listed in the call stack). Also there is some contribution for the Intel TBB library as well, and finally some heavy math (__ocl_svml_b2_sqrt) that is attributed to the code from SVML module.

In general, seeing many entries for Intel TBB in the hotpots breakdown might indicate some inefficiency in work-groups scheduling, for example small number of them, or work-groups that are too lightweight. Refer to the “Threading: Achieving Parallelism Between Work-Groups” section for more information.

If you click a specific kernel, you can inspect the resulting assembly code. This is useful to locate the expensive instructions for example:

See Also

Threading: Achieving Parallelism Between Work-Groups
Utilizing Software Prefetching
Efficient Data Layout
Use Lower Math Precision
Use Branching Accurately
Developer Guide for Intel® SDK for OpenCL™ Applications
Optimization and Performance Tuning for Intel® Xeon Phi™ Coprocessors, Part 2
Intel® Xeon Phi™ Processor Targets