Intel® VTune™ Amplifier

General Exploration Analysis for Hardware Issues

Use the General Exploration microarchitecture analysis to triage hardware issues in your application.

Once you have used Basic Hotspots or Advanced Hotspots analysis to determine hotspots in your code, you can perform General Exploration analysis to understand how efficiently your code is passing through the core pipeline. During General Exploration analysis, the VTune Amplifier collects a complete list of events for analyzing a typical client application. It calculates a set of predefined ratios used for the metrics and facilitates identifying hardware-level performance problems.

General Exploration Viewpoint: Bottom-up Window

To use the General Exploration analysis, explore:

General Exploration Analysis Strategy

The General Exploration analysis strategy varies by microarchitecture. For modern microarchitectures starting with Intel microarchitecture code name Ivy Bridge, the General Exploration analysis is based on the Top-Down Microarchitecture Analysis Method using the Top-Down Characterization methodology, which is a hierarchical organization of event-based metrics that identifies the dominant performance bottlenecks in an application.

Superscalar processors can be conceptually divided into the front-end, where instructions are fetched and decoded into the operations that constitute them, and the back-end, where the required computation is performed. Each cycle, the front-end generates up to four of these operations. It places them into pipeline slots that then move through the back-end. Thus, for a given execution duration in clock cycles, it is easy to determine the maximum number of pipeline slots containing useful work that can be retired in that duration. The actual number of retired pipeline slots containing useful work, though, rarely equals this maximum. This can be due to several factors: some pipeline slots cannot be filled with useful work, either because the front-end could not fetch or decode instructions in time (Front-end bound execution) or because the back-end was not prepared to accept more operations of a certain kind (Back-end bound execution). Moreover, even pipeline slots that do contain useful work may not retire due to bad speculation. Front-end bound execution may be due to a large code working set, poor code layout, or microcode assists. Back-end bound execution may be due to long-latency operations or other contention for execution resources. Bad speculation is most frequently due to branch misprediction.

Each cycle, each core can fill up to four of its pipeline slots with useful operations. Therefore, for some time interval, it is possible to determine the maximum number of pipeline slots that could have been filled in and issued during that time interval. This analysis performs this estimate and breaks up all pipeline slots into four categories:

To use General Exploration analysis, first determine which top-level category dominates for hotspots of interest. You can then dive into the dominating category by expanding its column. There, you can find many issues that may contribute to that category.

You can also run the General Exploration analysis on other microarchitectures that are NOT covered with the Top-Down Method in the VTune Amplifier:

Note

Configuration Options

To configure options for the General Exploration analysis:

Prerequisites: Create a project and specify an analysis target.

  1. Click the (standalone GUI)/ (Visual Studio IDE)New Analysis button on the Intel® VTune™ Amplifier toolbar.

    The New Amplifier Result tab opens with the Analysis Type window active.

  2. From the analysis tree on the left pane, select Microarchitecture Analysis > General Exploration.

    The analysis configuration pane opens on the right.

    Note

    For detailed information on events collected for General Exploration on a particular microarchitecture, refer to the Intel Processor Event Reference.

  3. Configure the following options:

    CPU sampling interval, ms spin box

    Specify an interval (in milliseconds) between CPU samples.

    Possible values - 1-1000.

    The default value is 1 ms.

    Analyze memory bandwidth check box

    Collect the data required to compute memory bandwidth.

    The option is disabled by default.

    Evaluate max DRAM bandwidth check box

    Evaluate maximum achievable local DRAM bandwidth before the collection starts. This data is used to scale bandwidth metrics on the timeline and calculate thresholds.

    The option is enabled by default.

    Analyze OpenMP regions check box

    Instrument and analyze OpenMP regions to detect inefficiencies such as imbalance, lock contention, or overhead on performing scheduling, reduction and atomic operations.

    The option is disabled by default.

    Analyze user tasks, events, and counters check box

    Analyze the tasks, events, and counters specified in your code via the ITT API. This option causes a higher overhead and increases the result size.

    The option is disabled by default.

    Details button

    Expand/collapse a section listing the default non-editable settings used for this analysis type. If you want to modify or enable additional settings for the analysis, you need to create a custom configuration by copying an existing predefined configuration. VTune Amplifier creates an editable copy of this analysis type configuration and locates it under the Custom Analysis section on the left pane.

    Note

    You may generate a command line for this configuration using the Command Line... button at the bottom.

  4. Click Start to run the analysis.

Viewpoints

Start your analysis with the default General Exploration viewpoint that includes the following views:

See Also