Intel® VTune™ Amplifier
Use a platform-wide Input and Output analysis to monitor utilization of the disk subsystem, CPU and processor buses.
This is a PREVIEW FEATURE on Windows* OS. A preview feature may or may not appear in a future production release. It is available for your use in the hopes that you will provide feedback on its usefulness and help determine its future. Data collected with a preview feature is not guaranteed to be backward compatible with future releases. Please send your feedback to parallel.studio.support@intel.com or to intelsystemstudio@intel.com.
This analysis type uses the hardware event-based sampling collection and system-wide Ftrace* collection (for Linux* and Android* targets)/ETW collection (Windows* targets) to provide a consistent view of the storage sub-system combined with hardware events and an easy-to-use method to match user-level source code with I/O packets executed by the hardware.
The analysis actively relies on the data produced by the kernel block driver system. In case your platform utilizes a non-standard block driver sub-system (for example, user-space storage drivers), IO metrics will not be available in the analysis type.
The Input and Output analysis helps identify:
Imbalance between I/O and compute operations (HPC applications)
Long latency of I/O requests (transactional workloads)
Hardware utilization (streaming)
Data plane utilization (applications supporting DPDK framework). You can analyze how your application utilizes NIC ports, bandwidth, PCIe, and UPI.
I/O performance issues that may be caused by ineffective accesses to remote sockets or under-utilized throughput of an SPDK device
VTune Amplifier uses the following system-wide metrics for the I/O analysis:
I/O Wait system-wide metric (Linux* targets only) shows the time when system cores are idle but there are threads in a context switch caused by I/O access.
I/O Queue Depth metric shows the number of I/O requests submitted to the storage device. Zero requests in a queue means that there are no requests scheduled and disk is not used at all.
I/O Data Transfer metric shows the number of bytes read from or written to the storage.
Page Faults metric shows the number of page faults occurred on a system. This metric is useful when analyzing access to memory mapped files.
CPU Activity metric defines a portion of time the system spent in the following states:
Idle state - the CPU core is idle.
Active state - the CPU core is executing a thread.
I/O Wait (Linux targets only) - the CPU core is idle but there is a thread, blocked by an access to the disk, that could be potentially executed on this core.
PCIe Bandwidth metric represents an amount of data transferred via the PCIe bus per second. This metric is collected only on server platforms based on Intel microarchitecture code name Sandy Bridge EP and later.
SPDK Throughput Utilization metric helps identify under-utilization of the SPDK device throughput. You can use the Timeline view to correlate areas of the low SPDK throughput utilization with SPDK IO API calls and PCIe traffic breakdown and understand whether IO communications caused performance changes.
SPDK Effective Time metric shows the amount of time the SPDK effectively interacts with devices.
Prerequisites:
Run the Intel® VTune™ Amplifier with administrative privileges (Windows) or root privileges (Linux). For Input and Output analysis on Linux targets, the VTune Amplifier automatically sets perf_event_paranoid to 0.
Create a VTune Amplifier project and specify your analysis system and target (application, process, or system). Note that irrespective of the target type you select, the VTune Amplifier automatically enables the Analyze system-wide target option to collect system-wide metrics for the Input and Output analysis.
To run the Input and Output analysis:
Click the
(standalone GUI)/
(Visual Studio IDE)
Configure Analysis button on the VTune Amplifier toolbar.
The New Amplifier Result tab opens.
From the
HOW pane, click the
Browse button and select
Platform Analysis > Input and Output.
The corresponding analysis configuration opens.
Depending on you target app and analysis purpose, choose any of the following configuration options:
Select IO API type to profile |
By default, the VTune Amplifier profiles System Disk IO API. For DPDK applications, select DPDK IO API. For SPDK applications, select SPDK IO API. |
Analyze memory bandwidth check box |
Collect the data required to compute memory bandwidth. The option is enabled by default. |
Evaluate max DRAM bandwidth check box |
Evaluate maximum achievable local DRAM bandwidth before the collection starts. This data is used to scale bandwidth metrics on the timeline and calculate thresholds. The option is enabled by default. |
Click Start to run the analysis.
To run the Input and Output analysis from the command line, enter:
$ amplxe-cl -collect io -- <target> [target_options]
VTune Amplifier collects the data, generates a rxxxio result, and opens it in the default Input and Output viewpoint that displays statistics on I/O waits (Linux targets only), I/O operations and I/O data transfers distributed over time and correlated with the data on the application execution, and other metrics depending on the selected profiling type. For Disk IO analysis, start with the Disk Input and Output Histogram section of the Summary window. Identify slow I/O operations and switch to the grid view for further analysis.
If you identified imbalance between I/O and compute operations, consider modifying your code to make I/O operations asynchronous.
For I/O requests with long latency, check whether your data can be pre-loaded, written incrementally, or consider upgrading your storage device (to SSD, for example).