Intel® VTune™ Amplifier
Use the GPU In-kernel Profiling to analyze GPU kernel execution per code line and identify performance issues caused by memory latency or inefficient kernel algorithms.
This analysis type is available on the processors based on Intel® microarchitecture code name Broadwell and later.
The GPU In-kernel Profiling instruments your code and, depending on your configuration settings, helps identify performance-critical basic blocks or issues caused by memory accesses in the GPU kernels.
Since the GPU In-kernel Profiling incurs higher performance overhead than the GPU Compute/Media Hotspots analysis, you may consider first running the GPU Compute/Media Hotspots analysis to identify the hottest GPU computing task (GPU kernel) and then exploring this kernel with the GPU In-kernel Profiling.
GPU In-kernel profiling introduces the following key metrics:
Estimated GPU Cycles: The average number of GPU cycles per one kernel instance.
GPU Instructions Executed per Instance: The average number of GPU instructions executed per one kernel instance.
GPU Instructions Executed per Thread: The average number of GPU instructions executed by one thread per one kernel instance.
To run the GPU In-kernel Profiling analysis:
Prerequisites: Create a project and specify an analysis target and system.
Click the
Configure Analysis button on the
Intel® VTune™ Amplifier toolbar.
The New Amplifier Result tab opens.
From the
HOW pane, click the
Browse button and select
Platform Analysis > GPU In-kernel Profiling.
By default, this analysis type has the Trace OpenCL and Intel Media SDK programs option enabled.
From the Profiling mode drop-down menu, select a type of issues you want to analyze:
Basic blocks latency option helps you identify issues caused by algorithm inefficiencies;
Memory latency option helps identify latency issues caused by memory accesses. Consider using this option, if you ran the GPU Compute/Media Hotspots analysis and identified that the GPU kernel is throughput or memory-bound.
Optionally, if you want to narrow down the analysis to specific kernels (and minimize the overhead), specify the kernels of interest in the table. If required, modify the Instance step for each kernel, which is a sampling interval (in the number of kernels).
Click Start to run the analysis.
By default, the GPU In-kernel Profiling result opens in the GPU Compute/Media Hotspots viewpoint. You can start with the Summary window to identify the hottest GPU Computing Task, click it to navigate to the Graphics window and explore metrics collected for this hotspot:
Double-clicking the hot kernel in the Graphics window opens its source code:
The GPU In-kernel Profiling provides a full-scale analysis of the kernel source per code line. The hottest kernel code line is highlighted by default.
To view the performance statistics on GPU instructions executed per kernel instance, switch to the Assembly view: