Intel® VTune™ Amplifier

gpu-profiling Command Line Analysis

Use the GPU In-kernel Profiling to analyze GPU kernel execution per code line and identify performance issues caused by memory latency or inefficient kernel algorithms.

Note

GPU In-Kernel Profiling is temporarily removed from Intel VTune Amplifier 2019 Update 3 to address some defects.

How It Works

The GPU In-kernel Profiling instruments your code and, depending on your configuration settings, helps identify performance-critical basic blocks or issues caused by memory accesses in the GPU kernels.

GPU In-kernel profiling introduces the following key metrics:

Note

  • GPU In-kernel Profiling is available on the processors based on Intel® microarchitecture code name Broadwell and later.

  • Since the GPU In-kernel Profiling incurs higher performance overhead than the GPU Compute/Media Hotspots analysis, you may consider first running the GPU Compute/Media Hotspots analysis to identify the hottest GPU computing task (GPU kernel) and then exploring this kernel with the GPU In-kernel Profiling.

Syntax

$ amplxe-cl -collect gpu-profiling [-knob <knobName=knobValue>] -- <target> [target_options]

Knobs: gpu-profiling-mode, kernels-to-profile.

Note

For the most current information on available knobs (configuration options) for the GPU In-kernel Profiling, enter:

$ amplxe-cl -help collect gpu-profiling

Example

This example runs GPU In-kernel Profiling for a Linux target analyzing only the specified kernel1 and kernel2 with the sampling interval equal to 10 kernels.

$ amplxe-cl -collect gpu-profiling -knob gpu-profiling-mode=memlatency -knob kernels-to-profile=kernel1:1:10:4294967185,kernel2:1:10:4294967185 -- home/test/myApplication

What's Next

When the data collection is complete, do one of the following to view the result:

See Also