Intel® VTune™ Amplifier 201
Use the Basic Hotspots analysis to understand application flow and identify sections of code that get a lot of execution time (hotspots). This is a starting point for your algorithm analysis.
A large number of samples collected at a specific process, thread, or module can imply high processor utilization and potential performance bottlenecks. Some hotspots can be removed, while other hotspots are fundamental to the application functionality and cannot be removed.
Intel® VTune™ Amplifier displays a list of functions in your application ordered by the amount of time spent in each function. It also captures the call stacks for each of these functions so you can see how the hot functions are called.
VTune Amplifier uses a low overhead (about 5%) user-mode sampling and tracing collection that gets you the information you need without slowing down application execution significantly. The data collector profiles your application using the OS timer, interrupts a process, collects samples of all active instruction addresses, and captures a call sequence (stack) for each sample. VTune Amplifier stores the sampled instruction pointer (IP) along with a call sequence in data collection files, and then analyzes and displays this data in a result tab. Statistically collected IP samples with call sequences enable the VTune Amplifier to display a top-down tree (call tree). Use this data to understand the control flow for statistically important code sections.
The collector does not gather system-wide performance data but focuses on your application only. To analyze system performance, run the Advanced Hotspots analysis.
To use the Basic Hotspots analysis, explore:
Configuration options (knobs)
To configure options for the Basic Hotspots analysis:
Prerequisites: Create a project and specify an analysis target.
Click the
(standalone GUI)/
(Visual Studio IDE)
New Analysis toolbar button.
The New Amplifier Result tab opens with the Analysis Type tab active.
Select the Algorithm Analysis > Basic Hotspots analysis type from the analysis tree on the left pane.
The Basic Hotspots pane opens on the right.
Configure the following options:
CPU sampling interval, ms spin box |
Specify an interval (in milliseconds) between CPU samples. Possible values - 1-1000. The default value is 10. |
Analyze user tasks, events, and counters check box |
Analyze the tasks, events, and counters specified in your code via the ITT API. This option causes a higher overhead and increases the result size. The default value is false. |
Analyze OpenMP regions check box |
Instrument and analyze OpenMP regions to detect inefficiencies such as imbalance, lock contention, or overhead on performing scheduling, reduction and atomic operations. The default value is false. |
Details button |
Expand/collapse a section listing the default non-editable settings used for this analysis type. If you want to modify or enable additional settings for the analysis, you need to create a custom configuration by copying an existing predefined configuration. VTune Amplifier creates an editable copy of this analysis type configuration and locates it under the Custom Analysis section on the left pane. |
Click Start to run the analysis.
You may generate the command line for this configuration using the Command Line... button at the bottom.
You can explore Basic Hotspots analysis results from different perspectives using the following viewpoints:
Viewpoint |
Description |
---|---|
Hotspots |
Helps identify hotspots - code regions in the application that consume a lot of CPU time. |
Hotspots by CPU Utilization |
Helps identify hotspots - code regions in the application that consume a lot of CPU time. CPU time is broken down into CPU utilization states: idle, poor, fair, and good. |
Each viewpoint consists of the following windows/panes:
Summary window displays statistics on the overall application execution, identifying CPU time and processor utilization.
Bottom-up window displays hotspot functions in the bottom-up tree, CPU time and CPU utilization per function.
Top-down Tree window displays hotspot functions in the call tree, performance metrics for a function only (Self value) and for a function and its children together (Total value).
Caller/Callee window displays parent and child functions of the selected focus function.
Platform window provides details on CPU and GPU utilization, frame rate, memory bandwidth, and user tasks (if corresponding metrics are collected).
Identify the most time-consuming function in the grid and double-click it for source analysis.
Analyze the source of the critical function starting with the highlighted hottest code line and moving further with the Hotspot Navigation options.
Modify your code to remove bottlenecks and improve the performance of your application.
Re-run the analysis and verify your optimization with the comparison mode.
Information provided by Basic Hotspots analysis is important for tuning serial applications and it is still useful for tuning the serial sections of parallel applications. The Basic Hotspots analysis data helps you understand what your application is doing and identify the code that is critical to tune. For parallel applications running on multi-core systems you may need additional analyses: Concurrency, Locks and Waits, or HPC Performance Characterization.