Intel® VTune™ Amplifier
Threading analysis extended with the lower overhead hardware event-based sampling mode. This mode helps analyze an impact of thread preemption and context switching. On Windows*, this analysis configuration requires the sampling driver. On Linux*, the analysis is available both with the sampling driver and with the Linux Perf* collector for kernels 4.4 and higher.
Quality and usability improvements:
summary command line report for the Hotspots analysis enriched with metrics and Top 5 Hotspots table that is also available from the GUI Summary view.
A sample matrix project added to the Project Navigator to help you get started with the product, review a sample pre-collected Hotspots result, and test other analysis types and source view options. A pre-built version of the matrix sample application and associated source files are available installed with VTune Amplifier.
Support for Linux Perf* collection extended with VTune Amplifier metrics with a further option to import the Perf trace to the VTune Amplifier GUI and benefit from predefined viewpoints. This solution could be useful for performance analysis in data centers)
New Hotspots analysis, combining former Basic Hotspots and Advanced Hotspots analysis configurations, that provides quick understanding of the application performance hotspots and further analysis steps - insights. By default, the Hotspots analysis operates on the user-mode sampling collection mode, but you can enable the lower overhead hardware event-based sampling mode that requires the sampling driver to be installed.
New Threading analysis combining and replacing former Concurrency and Locks and Waits analysis types
New Intel VTune Amplifier Platform Profiler tool that provides low-overhead, system-wide analysis and insights into overall system configuration performance and behavior. Use the tool to:
Identify bottlenecks by monitoring over- or under-utilized subsystems and buses (CPU, storage, memory, PCIe, and network interfaces) and platform-level imbalances
Understand a system topology using diagrams annotated with performance data
Capture average-case and transient behaviors for data-center applications
Microarchitecture analysis improvements:
Microarchitecture Exploration (formerly known as General Exploration) analysis configuration split to provide either a lightweight summary analysis or full detailed analysis with all levels of PMU metrics
Microarchitecture Exploration analysis view extended with the hardware metric representation that helps easily identify bottlenecks in the hardware usage and benefit from quick insights
HPC workload profiling improvements:
CPU Utilization metric refined to differentiate the utilization on logical vs. physical cores, which is particularly important for HPC applications running on Intel® Xeon® processor family processors
Intel® Omni-Path Architecture Interconnect Bandwidth and Packet rate metrics added to HPC Performance Characterization analysis to identify performance bottlenecks caused by interconnect limits
HPC Performance Characterization analysis enriched with a thread affinity report that helps analyze CPU utilization or memory access issues of multithreaded and hybrid MPI and OpenMP* applications
GPU Compute/Media Hotspots analysis (formerly known as GPU Hotspots) on Linux updated to use Intel Metric Discovery API library for GPU metric collection, which involves support for kernel 4.14 and higher
Input and Output analysis on Linux* extended to profile DPDK and SPDK IO API. Use this data to correlate CPU activity with the network data plane utilization, visualize PCIe bandwidth utilization per NIC, estimate UPI bandwidth on multi-socket systems, and identify bottlenecks.
Containerization support improvements:
Support for user-mode sampling and tracing collection (Hotspots analysis) added for Docker* container targets
Profiling support for targets running in the Singularity* containers
Profiling native and Java applications in the Docker and LXC containers
Managed runtime analysis improvements:
Extended JIT profiling for server-side applications running on the LLVM* or HHVM* PHP servers to support the event-based sampling analysis in the attach mode
Extended Java* code analysis with support for OpenJDK* 9 and Oracle* Java SE Development Kit 9
Improved source code analysis for .NET* Core applications running on Linux and Windows systems
Analysis on embedded platforms and accelerators:
New CPU/FPGA Interaction analysis (PREVIEW) to assess the balance between the CPU and FPGA on systems with a discrete Intel® Arria® 10 FPGA running OpenCL™ applications
New GPU Rendering analysis (PREVIEW) for CPU/GPU utilization of your code running on the Xen* virtualization platform installed on a remote embedded target
Support for the sampling command-line analysis on remote QNX* embedded systems via ethernet connection
A PREVIEW FEATURE may or may not appear in a future production release. It is available for your use in the hopes that you will provide feedback on its usefulness and help determine its future. Data collected with a preview feature is not guaranteed to be backward compatible with future releases. Please send your feedback to parallel.studio.support@intel.com or to intelsystemstudio@intel.com.
KVM guest OS profiling extended to profile both KVM kernel and user space from the host system, which is helpful for a full-scale performance analysis of host and guest systems
Application Performance Snapshot improvements:
Added uncore-based metrics for DRAM/MCDRAM memory analysis, which helps identify whether your application is bandwidth bound
Added the ability to pause/resume collection with MPI_Pcontrol and itt API. The -start-paused option was added to exclude application execution from collection from the start to the first collection resume occurrence.
Enabled selection of which data types are collected to reduce overhead. The choices include MPI tracing, OpenMP tracing, hardware counter based collection, or a combination of the three.
Exposed the CPU Utilization metric by physical cores on processors that support proper hardware events.
Significantly reduced MPI tracing overhead when there are a large number of ranks.
Enriched MPI statistics generated by the aps-report utility by showing information about communicators used in the application and to group and filter collective operations by the communicators.
Improved integration with Intel® Trace Analyzer and Collector by adding the ability to generate profiling configuration files with the aps-report option.
Intel® Omni-Path Architecture Interconnect Bandwidth and Packet rate metrics added to explore MPI communication bottlenecks
Added an HTML-based rank-to-rank communication diagram to better visualize MPI application communication patterns
Quality and usability improvements:
Optimized product graphical interface with a simplified analysis configuration workflow providing you with pre-selected target and collection options available in the same view
Hardware event-based analysis supported for targets running in the Hyper-V* environment on Windows* 10 Fall Creators Update (RedStone3)
Default finalization mode set to Fast to minimize post-processing overhead if the number of collected samples exceeds the threshold
The Data of Interest type of metric used for the hotspot navigation in the Source view replaced with the explicit metric selection in the grid and applying the Use for Hotspot Navigation context menu command
CPU Frequency metric provided for the event-based analysis types (using the sampling driver) is improved to display more reliable data based on the P-State collection. The CPU Frequency metric is not provided for the user-mode sampling and tracing analyses and for analyses using the Perf* collector.
A list of supported output formats for the command line reports extended to support XML and HTML options
Support for new operating systems:
SUSE* Linux* Enterprise Server (SLES) 15
Red Hat* Enterprise Linux* 7.5
Ubuntu* 18.04
Fedora* 28
Microsoft Windows* 10 RS4