Intel® VTune™ Amplifier
If you create a copy of a predefined analysis type based on the user-mode sampling and tracing collection, the Intel® VTune™ Amplifier provides options applicable to this collection and by default turns on the options enabled for the original predefined analysis type.
The following user-mode sampling and tracing collection options are available for a custom configuration:
Use This |
To Do This |
---|---|
Collect CPU sampling data menu |
Choose whether to collect information about CPU samples and related call stacks. |
CPU sampling interval, ms field |
Specify an interval between collected CPU samples in milliseconds. |
Collect highly accurate CPU time check box (for Windows targets only) |
Obtain more accurate CPU time data. This option causes more runtime overhead and increases result size. Administrator privileges are required. |
Collect synchronization API data menu |
Choose whether to collect information about synchronization wait calls and related call stacks. This analysis option helps identify where threads are waiting or enables you to compute thread concurrency. The collector instruments APIs, which causes higher overhead and increases result size. |
Collect signalling API data menu |
Choose whether to collect information about synchronization objects and call stacks for signaling calls. This analysis option helps identify synchronization transitions in the timeline and signalling call stacks for associated waits. The collector instruments signalling APIs, which causes higher overhead and increases result size. |
Collect I/O API data menu |
Choose whether to collect information about I/O calls and related call stacks. This analysis option helps identify where threads are waiting or enables you to compute thread concurrency. The collector instruments APIs, which causes higher overhead and increases result size. |
Analyze user tasks, events, and counters check box |
Analyze tasks, events, and counters specified in your code via the ITT API. This option causes a higher overhead and increases the result size. |
Analyze user synchronization check box |
Enable User synchronization API profiling to analyze thread synchronization. This option causes higher overhead and increases result size. |
Stack unwinding mode menu |
Choose whether collection requires online (during collection) or offline (after collection) stack unwinding. Offline mode reduces analysis overhead and is typically recommended. |
Stitch stacks check box |
For applications using Intel Threading Building Blocks (Intel TBB) or OpenMP* with Intel runtime libraries, restructure the call flow to attach stacks to a point introducing a parallel workload. |
Linux Ftrace events / Android framework events field |
Use the kernel events library to select Linux Ftrace* and Android* framework events to monitor with the collector. The collected data show up as tasks in the Timeline pane. You can also apply the task grouping level to view performance statistics in the grid. |
Analyze GPU Usage check box (for Linux* targets available with Intel HD Graphics and Intel Iris® Graphics only) |
Analyze GPU usage and identify whether your application is GPU or CPU bound. |
Analyze Processor Graphics hardware events drop-down menu |
Analyze performance data from Intel HD Graphics and Intel Iris Graphics (further: Intel Graphics) based on the predefined groups of GPU metrics. |
GPU sampling interval, ms field |
Specify an interval between GPU samples. |
Trace OpenCL and Intel Media SDK Processor Graphics (Intel Graphics Driver only) check box |
Capture the execution time of OpenCL™ kernels and Intel Media SDK programs on a GPU, identify performance-critical GPU tasks, and analyze the performance per GPU hardware metrics. NoteIntel Media SDK programs analysis is supported for Linux targets only. |
Disable alternative stacks for signal handlers check box (available for Linux targets) |
Disable using alternative stacks for signal handlers. Consider this option for profiling standard Python 3 code on Linux. |
Analyze loops check box |
Extend loops analysis to collect advanced loops information, such as instructions set usage and display analysis results by loops and functions. |
Managed runtime type to analyze menu |
Choose a type of the managed runtime to analyze. Available options are:
|
Analyze OpenMP regions check box |
Instrument the OpenMP* regions in your application to group performance data by regions/work-sharing constructs and detect inefficiencies such as imbalance, lock contention, or overhead on performing scheduling, reduction, and atomic operations. Using this option may cause higher overhead and increase the result size. |
Profiling mode drop-down menu |
Select a profiling mode to identify basic blocks latency due to algorithm inefficiencies, or memory latency due to memory access issues. This option is typically used for GPU in-kernel profiling. Use the table to specify the kernels of interest and narrow down the GPU in-kernel analysis to specific kernels minimizing the collection overhead. If required, modify the instance step for each kernel, which is a sampling interval (in the number of kernels). |
You may generate the command line for this configuration using the Command Line... button at the bottom.