Intel® VTune™ Amplifier
Use Intel VTune Amplifier's Input and Output analysis to profile SPDK IO API, analyze PCIe traffic, and identify IO performance issues that may be caused by ineffective accesses to remote sockets, under-utilized throughput of an SPDK device, and others.
VTune Amplifier helps you optimize the following SPDK usage models:
Application services:
SPDK vhost-scsi to provide optimized block storage to VMs
SPDK NVMe to optimize access to the locally attached storage
Disaggregated storage:
NVM Express* over Fabrics
iSCSI targets
For SPDK analysis, consider the following workflow:
Prerequisites: for successful analysis, make sure SPDK is built using the --with-vtune option.
Create and set up your SPDK analysis project with VTune Amplifier as follows:
Click the
New Project button on the VTune Amplifier toolbar to set up your basic project settings such as a project name and a project directory. Click
Create Project.
The Configure Analysis window opens.
From the WHERE pane, specify a system to be used for analysis, for example: Local Host.
Only Linux* target systems are supported.
From the WHAT pane, specify a target application to analyze.
From the HOW pane on the right, click the Browse button to select the Input and Output analysis type.
Select the SPDK IO API to profile.
Make sure to de-select the System Disk IO API option since SPDK and Disk IO analysis cannot be run simultaneously.
Click the
button to run the analysis.
Start your analysis with the Summary window that displays overall SPDK performance statistics per executed operation types. Expand an operation block to identify potential IO performance imbalance among SSDs:
Explore the SPDK Throughput histogram to understand how long your workload has been under-utilizing SPDK throughput per device:
Then, you can switch to the Bottom-up window and filter out the Timeline view by Low SPDK Throughput Utilization metric to see the correlation among the throughput under-utilization, SPDK IO API calls, and PCIe traffic breakdown per physical device:
Locate an area of recession (Low SPDK Throughput markers with a high duration) on the timeline and zoom in to see performance changes for IO communications (for example, drops for SPDK operations). Right-click and select the Filter In by Selection menu option:
When the Bottom-up view is filtered in, you can apply the Function grouping to the grid and identify functions executed at the selected time frame. Double-click a function with the highest CPU time value to dive to the source view and analyze the code.
Use the Platform window to analyze whether your SPDK workload is configured properly for a multi-socket system. To do this, switch to the Package/Core/H/W Context grouping on the legend pane to track IO performance per package.
The example below illustrates an ineffective IO flow when an SPDK device and core consuming/producing data belong to different packages. As a result, you see high UPI Bandwidth values, which signals a heavy utilization of the interconnect: