Intel® VTune™ Amplifier

OpenMP* Analysis

Use the Intel® VTune™ Amplifier for performance analysis of OpenMP* applications compiled with Intel® Compiler.

Prerequisites:

OpenMP is a fork-join parallel model, which starts with an OpenMP program running with a single master serial-code thread. When a parallel region is encountered, that thread forks into multiple threads, which then execute the parallel region. At the end of the parallel region, the threads join at a barrier, and then the master thread continues executing serial code. It is possible to write an OpenMP program more like an MPI program, where the master thread immediately forks to a parallel region and constructs such as barrier and single are used for work coordination. But it is far more common for an OpenMP program to consist of a sequence of parallel regions interspersed with serial code.

Ideally, parallelized applications have working threads doing useful work from the beginning to the end of execution, utilizing 100% of available CPU core processing time. In real life, useful CPU utilization is likely to be less when working threads are waiting, either actively spinning (for performance, expecting to have a short wait) or waiting passively, not consuming CPU. There are several major reasons why working threads wait, not doing useful work:

VTune Amplifier together with Intel Composer XE 2013 Update 2 or higher help you understand how an application utilizes available CPUs and identify causes of CPU underutilization.

Configure OpenMP Analysis

To enable OpenMP analysis for your target:

  1. Click the (standalone GUI)/ (Visual Studio IDE)Configure Analysis button on the Intel® VTune™ Amplifier toolbar.

    The Configure Analysis window opens.

  2. From HOW pane, click the Browse button and select an analysis type that supports OpenMP analysis: Threading, HPC Performance Characterization, Memory Access, or any Custom Analysis type.

  3. Select the Analyze OpenMP regions option, if it is not pre-selected (see the Details section to confirm).

  4. Click the Start button to run the analysis.

The OpenMP runtime library in the Intel Composer provides special markers for applications running under profiling that can be used by the VTune Amplifier to decipher the statistics of OpenMP parallel regions and distinguish serial parts of the application code.

Limitations

VTune Amplifier supports the analysis of parallel OpenMP regions with the following limitations:

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

See Also