Intel® VTune™ Amplifier 201

Advanced Hotspots Analysis

Advanced Hotspots analysis is a fast and easy way to identify performance-critical code sections (hotspots) in your application and correlate this data with system performance.

The periodic instruction pointer sampling performed by Intel® VTune™ Amplifier identifies code locations where an application spends more time than in others. A function may consume much time either because its code is slow or because the function is frequently called. But any improvements in the speed of such functions should have a bigger impact on overall application performance.

VTune Amplifier creates a list of functions in your application ordered by the amount of time spent in each function. By default, Advanced Hotspots analysis does not capture the function call stacks as the hotspots are collected, but it can be used to sample all processes on the system. This type of analysis uses event-based sampling collection and analyzes all the processes running on your system at the moment, providing CPU time data on whole system performance.

You still can analyze stacks for your application modules by selecting the collection level that includes stack analysis in the Advanced Hotspots pane. For example, selecting the Hotspots, call counts and stacks collection level extends the Advanced Hotspots analysis with performance, parallelism and power consumption data attributed to execution paths.

Note

On 32-bit Linux* systems, the VTune Amplifier uses a driverless Perf*-based collection with stacks to run the Advanced Hotspots analysis.

To use the Advanced Hotspots analysis, explore:

Configuration Options

To configure options for the Advanced Hotspots analysis:

Prerequisites: Create a project and specify an analysis target.

  1. Click the (standalone GUI)/ (Visual Studio IDE) New Analysis button on the Intel® VTune™ Amplifier toolbar.

    The New Amplifier Result tab opens with the Analysis Type window active.

  2. From the left pane, select Algorithm Analysis > Advanced Hotspots.

    The analysis configuration pane opens on the right.

  3. Configure the following options:

    CPU sampling interval, ms field

    Specify an interval (in milliseconds) between CPU samples.

    Possible values - 0.01-1000.

    The default value is 1.

    Collection Level options

    Select a level of details provided with event-based sampling collection. Detailed collection levels cause higher overhead.

    • Hotspots, call counts and stacks
    • Hotspots
    • Hotspots and stacks
    • Hotspots, call counts, loop trip counts and stacks

    The default value is Hotspots.

    Event mode drop-down menu

    Limit event-based sampling collection to OS or USER mode.

    • All
    • OS
    • USER

    The default value is All.

    Analyze user tasks, events, and counters check box

    Analyze the tasks, events, and counters specified in your code via the ITT API. This option causes a higher overhead and increases the result size.

    The default value is false.

    Analyze OpenMP regions check box

    Instrument and analyze OpenMP regions to detect inefficiencies such as imbalance, lock contention, or overhead on performing scheduling, reduction and atomic operations.

    The default value is false.

    Details button

    Expand/collapse a section listing the default non-editable settings used for this analysis type. If you want to modify or enable additional settings for the analysis, you need to create a custom configuration by copying an existing predefined configuration. VTune Amplifier creates an editable copy of this analysis type configuration and locates it under the Custom Analysis section on the left pane.

    Note

    • On 32-bit Linux systems, the VTune Amplifier does not support Hotspots, call counts and stacks and Hotspots, call counts, loop trip counts and stacks collection options.

    • You may generate the command line for this configuration using the Command Line... button at the bottom.

  4. Click Start to run the analysis.

Viewpoints

By default, the data collection result opens in the Hotspots viewpoint. Depending on your analysis configuration settings, you can switch to other viewpoints configured to display specific performance issues:

Viewpoint

Description

Hardware Events

Displays statistics of monitored hardware events: estimated count and/or the number of samples collected. Use this view to identify code regions (modules, functions, code lines, and so on) with the highest activity for an event of interest.

Hardware Issues

Helps identify where the application is not making the best use of available hardware resources. This viewpoint displays metrics derived from hardware performance counters. Hover over the highlighted metrics values in the grid to read why the extreme value might represent a performance problem.

Hotspots

Helps identify hotspots - code regions in the application that consume a lot of CPU time.

HPC Performance Characterization

Helps understand how effectively your application uses CPU, memory, and floating-point operation resources. Use this view to identify scalability issues for Intel OpenMP and MPI runtimes as well as next steps to increase memory and FPU efficiency.

These viewpoints may include the following windows:

What's Next

You can go from the hotspots to the source code. View the source code containing the hotspots and modify your code to remove bottlenecks and improve the performance of your application.

Information provided by Advanced Hotspots analysis is important for tuning serial applications and it is still useful for tuning the serial sections of parallel applications. For algorithm tuning, you may also choose to run the Basic Hotspots analysis and analyze the call flow of the application or run the Concurrency analysis to estimate the effectiveness of the parallel algorithms you use.

See Also