Intel® VTune™ Amplifier

Threading Analysis

Use the Threading analysis to identify how efficiently an application uses available processor compute cores and explore inefficiencies in threading runtime usage or contention on synchronization objects that prevent effective processor utilization.

Note

Threading analysis combines and replaces the Concurrency and Locks and Waits analysis types available in previous versions of Intel® VTune™ Amplifier.

How It Works

One common problem in parallel applications is threads waiting too long on synchronization objects (locks) that are in the critical path of application execution. Performance suffers when waits occur while cores are under-utilized. Threading analysis also shows how much time threading the application spends in threading runtimes either because of busy waits or overhead on parallel work arrangement.

Threading analysis uses user-mode sampling and tracing collection. With this analysis you can estimate the impact each synchronization object has on the application and understand how long the application had to wait on each synchronization object, or in blocking APIs, such as sleep and blocking I/O.

There are two groups of synchronization objects supported by the Intel® VTune™ Amplifier:

Configure and Run Analysis

To configure options for the Threading analysis:

Prerequisites: Create a project and specify an analysis target.

  1. Click the (standalone GUI)/ (Visual Studio IDE)Configure Analysis button on the Intel® VTune™ Amplifier toolbar.

    The Configure Analysis window opens.

  2. From HOW pane, click the Browse button and select Threading.

  3. Configure the collection options, including the sampling interval.

    Note

    You may generate the command line for this configuration using the Command Line button at the bottom.

  4. Click the Start button to run the analysis.

View Data

The Threading analysis results appear in the Threading Efficiency viewpoint, which consists of the following windows/panes:

What's Next

  1. Start on the Summary window to explore the CPU utilization of your application and identify reasons for underutilization connected with synchronization or parallel work arrangement overhead. Click links associated with flagged issues to be taken to more detailed information. For example, clicking a sync object name in the Top Waiting Objects table takes you to that object in the Bottom-up window.

  2. Analyze thread integration synchronization objects with wait and signal stacks and transitions on the timeline. Explore CPU time spent in threading runtimes to classify inefficiencies in their use.

  3. Modify your code to remove CPU utilization bottlenecks and improve the parallelism of your application.

    Concentrate your tuning on objects with long Wait time where the system is poorly utilized (red bars) during the wait. Consider adding parallelism, rebalancing, or reducing contention. Ideal utilization (green bars) occurs when the number of running threads equals the number of available logical cores.

  4. Re-run the analysis to verify your optimization with the comparison mode and identify more possible areas for improvement.

See Also