Intel® VTune™ Amplifier
To analyze the performance of an DPDK application, consider the following analysis options provided by the Intel® VTune™ Amplifier:
Run the Microarchitecture Exploration analysis to understand whether your application is core or memory bound.
Analyze empty cycles with the Input and Output analysis to explore whether you application retrieves packets effectively.
Reconfigure and rebuild DPDK to enable empty cycles tracing with ITT API.
Run the Input and Output analysis to analyze PCIe/DRAM/UPI/QPI bandwidth usage for your DPDK application.
Empty Cycles are DPDK polling loop iterations that bring no packets. To benefit from the DPDK IO API profiling, make sure the DPDK is configured to enable empty cycles tracing. See the DPDK Programmer's Guide for configuration steps: http://dpdk.org/doc/guides/prog_guide/profile_app.html#empty-cycles-tracing.
Create and set up your DPDK analysis project as follows:
Click the
New Project button on the VTune Amplifier toolbar to set up your basic project settings such as a project name and a project directory. Click
Create Project.
The Configure Analysis window opens.
From the WHERE pane, specify a system to be used for analysis, for example: Local Host.
Only Linux* target systems are supported.
From the WHAT pane, specify a target application to analyze.
From the HOW pane on the right, click the Browse button to select the Input and Output analysis type.
Select the DPDK IO API to profile.
Click the
button to run the analysis.
Consider employing the custom collector to collect additional data (for example, network statistics) and correlate this data with empty cycles distribution.
For data plane applications, you may see 100% of CPU utilization on the Timeline view, which could result from the DPDK poll mode that prevents measuring core effective utilization. Enable the I/O APIs markers on the timeline that show empty cycles as user tasks on the timeline. In the tooltips, you can see the task duration, DPDK port and Rx queue IDs:
Use the Platform window grouped by H/W Context/Threads to explore key bandwidth statistics per device:
Analyze the PCIe Bandwidth to estimate the NIC traffic. Typically, your performance goal is to maximize the NIC traffic to the NIC ports maximum.
Ensure the DRAM Bandwidth value is low enough, which means that Intel® Data Direct I/O Technology (Intel DDIO) and Last Level Cache work properly.
Analyze UPI Bandwidth on the multi-socket systems. High values may signal a potential misconfiguration problem.