Intel® VTune™ Amplifier
Typically, for DPDK analysis, you are recommended to start with the General Exploration analysis to understand whether your application is core or memory bound. For memory bound applications, consider exploring the bottlenecks with the Input and Output analysis:
Reconfigure and rebuild DPDK to enable empty cycles tracing with ITT API.
Explore bandwidth metrics on the timeline.
To benefit from the DPDK IO API profiling, make sure your application is instrumented to enable the empty cycles tracing support. Empty Cycles show the percentage of wasted DPDK polling loop cycles when the DPDK does not fetch any packets. See the DPDK Programmer's Guide for configuration steps: http://dpdk.org/doc/guides/prog_guide/profile_app.html#empty-cycles-tracing.
Create and set up your DPDK analysis project as follows:
Click the
New Project button on the VTune Amplifier toolbar to set up your basic project settings such as a project name and a project directory. Click
Create Project.
The Choose Target and Analysis Type window opens with the Analysis Target tab active.
From the left pane, choose a target system and target type. For example: Local Host and Launch Application.
Only Linux* target systems are supported.
From the right pane, specify a target application to analyze.
Switch to the Analysis Type window and select the Input and Output analysis type from the left pane.
Select the DPDK IO API to profile.
Click the Start button to run the analysis.
Consider employing the custom collector to track the number of missed packets at different traffic generation rate and correlate this data with empty cycles distribution.
For data plane applications, you may see 100% of CPU utilization on the Timeline view, which could result from the DPDK poll mode that prevents measuring core effective utilization. Enable the I/O APIs markers on the timeline that show empty cycles as user tasks on the timeline. In the tooltips, you can see the task duration, DPDK port and Rx queue IDs:
If you employed a custom collector for network statistics, you can correlate empty cycles with the number of missed packets at different traffic generation rates.
Use the Platform window grouped by H/W Context/Threads to explore key bandwidth statistics per device:
Analyze the PCIe Bandwidth to estimate the NIC traffic. Typically, your performance goal is to maximize the NIC traffic to the NIC ports maximum.
Ensure the DRAM Bandwidth value is low enough, which means that Intel® Data Direct I/O Technology (Intel DDIO) and Last Level Cache work properly.
Analyze UPI Bandwidth on the multi-socket systems. High values may signal a potential misconfiguration problem.