Intel® VTune™ Amplifier

Concurrency View

Use the Intel® VTune™ Amplifier viewpoints to analyze how long your application threads run in parallel and how effectively your application utilizes available CPU cores.

The following viewpoints are available:

To interpret the performance data provided in these viewpoints, you may follow the steps below:

  1. Define a performance baseline.

  2. Identify functions with poor concurrency and poor CPU utilization.

  3. Analyze the timeline.

  4. Identify algorithm issues.

  5. Analyze source.

  6. Explore other analysis types.

Define a Performance Baseline

Start with analyzing the application-level data provided in the Summary window for this analysis result. Use the Elapsed time as your primary indicator and a baseline for comparison of results before and after optimization.

Explore the CPU Utilization and Thread Concurrency histograms that represent the Elapsed time and utilization level for the specified number of running threads and available CPUs. Ideally, your longest bars should be within the Ok or Ideal utilization range defined by the Intel® VTune™ Amplifier.

Identify Functions with Poor Concurrency and Poor CPU Utilization

To identify functions that do not use available processor time effectively, explore the Bottom-up window .

To identify functions with poor CPU utilization, explore the Hotspots by CPU Utilization viewpoint. By default, the functions are sorted by Poor processor utilization type. The most critical functions are provided first. You can view the time distribution per processor utilization type by clicking the button at the Effective Time by Utilization column header to expand the column.

To identify functions that ran serially and did not use available cores effectively (functions with poor concurrency), switch to the Hotspots by Thread Concurrency viewpoint. The functions are sorted by CPU time with poor concurrency level. The usage mode is similar to the Hotspots by CPU Utilization viewpoint.

You should focus your optimization efforts on functions with the longest poor CPU time (red bars if the bar format is selected). Next search for the longest over-utilized time (blue bars).

The overall goal of optimization is to achieve Ideal (green ) or OK (orange ) utilization and shorten the Poor and Over CPU utilization/concurrency.

VTune Amplifier also measures the Overhead time and Spin time. If any of these metrics exceed the threshold set up by Intel architects for your processor type, the VTune Amplifier highlights these values in pink in the Bottom-up/Top-down Tree windows. Hover over the highlighted cell to get performance tuning advice.

Concurrency: Bottom-up Window

Analyze the Timeline

The Timeline pane at the bottom of the Bottom-up/Top-down Tree windows shows the thread behavior in your application and how CPU Utilization and Thread Concurrency metrics are changing over time. Analyze the data, select the problem area, and zoom in to selection using the context menu options. VTune Amplifier calculates the overall CPU Utilization metric as the sum of CPU time per each thread of the Threads area. Maximum CPU Utilization value is equal to [number of processor cores] x 100% and maximum Thread Concurrency is equal to the number of logical CPU count. In the example below, Thread Concurrency for a 4-core system is 2 and CPU Utilization is about 100%, which means that the CPUs were not effectively utilized during this time range.

Hotspots by CPU Utilization Viewpoint for Concurrency Analysis

To understand what your application was doing during a particular time frame, select this range on the timeline, right-click and choose Zoom In and Filter In by Selection. VTune Amplifier will display functions executed during this time range. Identify functions with high CPU time (hotspots) and double-click a hotspot to identify the code lines that caused the issue.

Correlate CPU Usage and Thread Concurrency data to identify potential performance issues:

If

Potential Performance Issue

Average CPU usage is close to target concurrency and average concurrency is much lower than target concurrency.

Parallel application with a lot of contention on spin locks

Average CPU usage and average concurrency are close to 1.

Serial application

Average CPU usage and average concurrency are almost the same, with values falling between 1 and target concurrency.

Parallel application with contention on usual synchronization-based locks

Both average CPU usage and average concurrency are close to target concurrency.

Good parallel application

Identify Algorithm Issues

You can identify issues with the call sequences in your application and improve performance by revising the way functions are called. The following methods to locate potential issues are available:

Analyze Source

When you identified a critical function, double-click it to open the Source/Assembly window and analyze the source code. From the Timeline pane, you can double-click the transition line to open the call site for this transition. You can open the code editor directly from the VTune Amplifier and edit your code (for example, adding parallelism, rebalancing or reducing contention).

Explore Other Analysis Types

See Also