Intel® VTune™ Amplifier

Concurrency Analysis of Threads Parallelism

Use the Concurrency analysis to identify hotspot functions where threads parallelism and processor utilization is poor.

Concurrency Analysis

Concurrency analysis provides information on how many threads were running at each moment during application execution. It includes threads which are currently running or ready to run and therefore are not waiting at a defined waiting or blocking API. VTune Amplifier also shows CPU time while the hotspot was executing and estimates its effectiveness either by CPU utilization or by Threads Concurrency. In the Hotspots by CPU Utilization viewpoint, a red bar indicates time where the processors are poorly utilized, which is a possible lead to help you decide where you should tune.

To use the Concurrency analysis, explore:

Configuration Options

To configure options for the Concurrency analysis:

Prerequisites: Create a project and specify an analysis target.

  1. Click the (standalone GUI)/ (Visual Studio IDE) New Analysis button on the Intel® VTune™ Amplifier toolbar.

    The New Amplifier Result tab opens with the Analysis Type window active.

  2. Select the Algorithm Analysis > Concurrency analysis type from the analysis tree on the left pane.

    The Concurrency pane opens on the right.

  3. Configure the following options:

    CPU sampling interval, ms spin box

    Specify an interval (in milliseconds) between CPU samples.

    Possible values - 1-1000.

    The default value is 10 ms.

    Analyze user tasks, events, and counters check box

    Analyze the tasks, events, and counters specified in your code via the ITT API. This option causes a higher overhead and increases the result size.

    The option is disabled by default.

    Analyze Intel runtimes and user synchronization check box

    Analyze thread synchronization by profiling User synchronization API used by Intel runtimes like OpenMP and Intel TBB or by user. This option causes higher overhead and increases result size.

    The option is disabled by default.

    Analyze OpenMP regions check box

    Instrument and analyze OpenMP regions to detect inefficiencies such as imbalance, lock contention, or overhead on performing scheduling, reduction and atomic operations.

    The option is disabled by default.

    Details button

    Expand/collapse a section listing the default non-editable settings used for this analysis type. If you want to modify or enable additional settings for the analysis, you need to create a custom configuration by copying an existing predefined configuration. VTune Amplifier creates an editable copy of this analysis type configuration and locates it under the Custom Analysis section on the left pane.

    Note

    You may generate the command line for this configuration using the Command Line... button at the bottom.

  4. Click Start to run the analysis.

Viewpoints

You can explore the Concurrency analysis results from different perspectives using the following viewpoints:

Viewpoint

Description

Hotspots

Helps identify hotspots - code regions in the application that consume a lot of CPU time.

Hotspots by CPU Utilization

Helps identify hotspots - code regions in the application that consume a lot of CPU time. CPU time is broken down into CPU utilization states: idle, poor, fair, and good.

Hotspots by Thread Concurrency

Helps identify hotspots - code regions in the application that consume a lot of CPU time. CPU time is broken down into thread concurrency states: idle, poor, fair, good, and over.

Locks and Waits

Shows how your application is utilizing available CPU cores and helps identify the cause of ineffective utilization, for example: threads waiting too long on synchronization objects (locks), I/O, or timers while CPU cores are underutilized. CPU time is represented by bars colored according to the CPU utilization level during the wait.

By default, the VTune Amplifier displays the results of Concurrency analysis in the Hotspots by Thread Concurrency viewpoint where:

What's Next

  1. Identify the most time-consuming serial function in the grid and double-click it for source analysis.

  2. Analyze the source of the critical function starting with the highlighted hottest code line and moving further with the Hotspot Navigation options.

  3. Modify your code to remove bottlenecks and improve the performance of your application.

    If you have a hotspot where not all cores are used, consider adding parallelism, re-balancing or reducing contention.

  4. Re-run the analysis and verify your optimization with the comparison mode.

To understand possible reasons for the ineffective processor utilization, run Locks and Waits analysis.

See Also