Intel® Advisor Help
Purpose and Usage | Run | Regions and Usage | Data | What Do I Do Next?
To add a Roofline chart to the Survey Report, run a Roofline analysis that helps you visualize actual performance against hardware-imposed performance ceilings, as well as determine the main limiting factor (memory bandwidth or compute capacity), thereby providing an ideal roadmap of potential optimization steps.
Use the Roofline chart to answer the following questions:
What is the maximum achievable performance with your current hardware resources?
Does your application work optimally on current hardware resources?
If not, what are the best candidates for optimization?
Is memory bandwidth or compute capacity limiting performance for each optimization candidate?
In the Vectorization
Workflow tab, click the
control under
Run Roofline to execute your target application twice to:
Measure the hardware limitations of your machine and collect loop/function timings using the Survey analysis.
Collect FLOP and integer operations data, and memory traffic data, using the Trip Counts and FLOP analysis - this collection can take three to four times longer than the Survey analysis.
After both analyses are complete, the Intel Advisor adds a Roofline chart to the Survey Report.
There are several controls to help you show/hide the
Roofline chart:
1 |
Click to toggle between Roofline chart view and Survey Report view. |
2 |
Click to toggle to and from side-by-side Roofline chart and Survey Report view. |
3 |
Drag to adjust the dimensions of the Roofline chart and Survey Report. |
There are several controls to help you focus on the data most important to you, including the following.
1 |
|
2 |
Adjust rooflines to see practical performance limits if an application uses fewer threads than available cores. |
3 |
|
4 |
|
5 |
|
6 |
Zoom in and out using numerical values. |
7 |
Hover your mouse over an item to display metrics for that item. If you hover your mouse over a dot, the Roofline chart displays two blue dots with metrics that show potential performance if you optimize the loop to reach the next roofline and the maximum achievable roofline. (If the next roofline and maximum achievable roofline are the same, the Roofline chart displays only one blue dot.) Click a dot to outline it in black and display corresponding code and metrics in other window tabs. After clicking a dot, right-click a blank area in the Roofline chart and choose:
|
8 |
Display the number and percentage of loops in each loop weight representation category. |
The Roofline chart plots an application's achieved performance and arithmetic intensity against the machine's maximum achievable performance:
Arithmetic intensity (x axis) - measured in number of floating-point operations (FLOPs) or integer operations (INTOPs) per byte, based on the loop/function algorithm, transferred between CPU/VPU and memory
Performance (y axis) - measured in billions of floating-point operations per second (GFLOPS) or billions of integer operations per second (GINTOPS)
In general:
The size and color of each Roofline chart dot represent relative execution time for each loop/function. Large red dots take the most time, so are the best candidates for optimization. Small green dots take less time, so may not be worth optimizing.
Roofline chart diagonal lines indicate memory bandwidth limitations preventing loops/functions from achieving better performance without some form of optimization. For example: The L1 Bandwidth roofline represents the maximum amount of work that can get done at a given arithmetic intensity if the loop always hits L1 cache. A loop does not benefit from L1 cache speed if a dataset causes it to miss L1 cache too often, and instead is subject to the limitations of the lower-speed L2 cache it is hitting. So the dot representing the loop is positioned somewhere below the L2 Bandwidth roofline.
Roofline chart horizontal lines indicate compute capacity limitations preventing loops/functions from achieving better performance without some form of optimization. For example: The Scalar Add Peak represents the peak number of add instructions that can be performed by the scalar loop under these circumstances. The Vector Add Peak represents the peak number of add instructions that can be performed by the vectorized loop under these circumstances. If a loop is not vectorized, the dot representing the loop is positioned somewhere below the Scalar Add Peak roofline.
A dot cannot exceed the topmost rooflines, as these represent the maximum capabilities of the machine; however, not all loops can utilize maximum machine capabilities.
The greater the distance between a dot and the highest achievable roofline, the more opportunity exists for performance improvement.
In the following
Roofline chart representation, loops A and G (large red dots), and to a lesser extent B (yellow dot far below the roofs), are the best candidates for optimization. Loops C, D, and E (small green dots) and H (yellow dot) are poor candidates because they do not have much room to improve.
See the Use Automated Roofline Chart to Make Optimization Decisions tutorial to learn how to:
Address memory bandwidth bottlenecks.
Address compute capacity bottlenecks.
Identify the real bottlenecks.