Intel® VTune™ Amplifier
Intel® VTune™ Amplifier provides a set of pre-configured analysis types you may start with to address your particular performance optimization goals.
Hotspots:
Basic Hotspots is best for analyzing call paths to find where your code is spending the most time and discover opportunities for tuning your algorithms. Applies to C/C++, Fortran, Java*, or Python* apps and more. See Finding Hotspots tutorial: Linux | Windows.
Advanced Hotspots is best for analyzing an application or the entire system and getting kernel information with higher resolution from shorter sampling intervals. Applies to C/C++, Fortran, or Java* apps and more, including apps in containers such as Docker* or LXC.
Parallelism:
Concurrency is best for visualizing thread parallelism on available cores, finding areas with high or low concurrency, and identifying serial bottlenecks in your code. Applies to C/C++, Fortran, or Java* apps and more. See Analyzing Parallelism tutorial (Fortran): Linux | Windows.
Locks and Waits is best for locating causes of low concurrency, such as heavily used locks and large critical sections. Applies to C/C++, Fortran, Java*, or Python* apps and more. See Analyzing Locks and Waits tutorial: Linux | Windows.
HPC Performance Characterization is best for understanding how your compute-intensive OpenMP* or MPI app is using the CPU, memory, and floating point unit (FPU) resources. Applies to C/C++ or Fortran apps and more. See Analyzing an OpenMP* and MPI Application tutorial: Linux.
Microarchitecture:
General Exploration is best for identifying the CPU pipeline stage (front-end, back-end, etc.) and hardware units responsible for your hardware bottlenecks. Applies to C/C++, Fortran, or Java* apps and more, including apps in containers such as Docker* or LXC.
Memory Access is best for memory-bound apps to determine which level of the memory hierarchy is impacting your performance by reviewing CPU cache and main memory usage, including possible NUMA issues. Applies to C/C++, Fortran, or Java* apps and more. See Identifying False Sharing tutorial: Linux.
In addition, the VTune Amplifier offers Platform analysis types that are helpful in specific use cases, such as GPU, disk I/O, IRQ analysis and so on:
CPU/GPU Concurrency enables you to explore code execution on the various CPU and GPU cores on your platform, correlate CPU and GPU activity and identify whether your application is GPU or CPU bound.
GPU Hotspots is targeted for applications using a Graphics Processing Unit (GPU) for rendering, video processing, and computations. If you ran the CPU/GPU Concurrency analysis and identified that your application is GPU-bound, use the GPU Hotspots analysis to go deeper and identify the most time-consuming GPU computing tasks and analyze performance per GPU hardware metrics.
GPU In-Kernel Profiling is targeted for GPU-bound applications and helps analyze GPU kernel execution per code line and identify performance issues caused by memory latency or inefficient kernel algorithms. This analysis type incurs a higher performance overhead.
System Overview is a driverless event-based sampling analysis that monitors general behavior of your target Linux* or Android* system and correlates power and performance metrics with the interrupt request (IRQ) handling.
Input and Output analysis monitors utilization of the IO subsystems, CPU and processor buses.
CPU/FPGA Interaction (preview) analysis explores FPGA utilization for each FPGA accelerator and identifies the most time-consuming FPGA computing tasks.
Graphics Rendering (preview) analysis is targeted to estimate the CPU/GPU utilization of your code running on the Xen virtualization platform.
A PREVIEW FEATURE may or may not appear in a future production release. It is available for your use in the hopes that you will provide feedback on its usefulness and help determine its future. Data collected with a preview feature is not guaranteed to be backward compatible with future releases. Please send your feedback to parallel.studio.support@intel.com.
As an alternative, advanced users may consider creating a custom analysis using the data collectors provided by the VTune Amplifier, or combining a VTune Amplifier's collector and any other custom collector.