Intel® C++ Compiler 18.0 Developer Guide and Reference
This topic only applies when targeting Intel® Graphics Technology.
Many models of Intel processors that include Intel® Graphics Technology, can execute a reasonable amount of parallelizable work on the processor graphics. In many cases, you can enable this offloading by adding a minimal amount of code. When compiling, the Intel® C++ Compiler facilitates offloading existing scalar or parallel C/C++ code written for the CPU to the processor graphics.
Architecture and OS support for offloading to processor graphics is shown in the following table:
Build Architecture and OS |
Executing Architecture and OS, for processors with Intel® Graphics Technology |
---|---|
Intel® 64, Linux* |
Intel® 64, Linux |
Intel® 64, Windows* |
Intel® 64, Windows IA-32, Windows |
In addition to a supported processor, running an application that offloads computing to the processor graphics requires the Intel® HD Graphics Driver to be installed to provide the necessary runtime support.
The compiler supports separate compilation and linking of target code that runs on Intel® Graphics Technology. The open source binutils package provides the linker support for linking the target kernel. See the Release Notes for more information.
The compiler generates sections of code to run natively on the CPU, that is, the host, and to offload to the processor graphics, that is, the target. The offload runtime refers to the runtime libraries used to organize offload operations to the target.
The compiler and the offload runtime enable the following:
the offload process and data exchange between the host and the target
dynamic detection of target availability, and back-up execution on the host if the target is not available
Additionally, the compiler and the offload runtime facilitate the following:
debugging, on the host, of the heterogeneous source code, which is intended to be offloaded
migrating the same source code for efficient execution on Intel processors or coprocessors based on the Intel® Many Integrated Core Architecture.
When you compile a source file that contains offload extensions for Intel® Graphics Technology, the resulting object file contains the target object embedded within it. This object file is called a fat object. The name of the target object section is .gfxobj. When you link fat objects, the target executable is:
a section embedded in the host executable, named .gfx bin (Windows*)
part of the read-only data section, and is the object named __gfx_offload_target_image (Linux*)
You can extract the target object or executable from a fat object or fat executable with the offload_extract tool.
The compiler supports the following programming models for offloading to the processor graphics:
OpenMP* Offload Model: Most of the offload constructs specified by the OpenMP specification are supported. You can see the complete list in the OpenMP-based offload section. See openmp.org for descriptions of the supported offload constructs.
The compiler supports many of the OpenMP offload constructs, listed in the OpenMP Pragmas Summary. See the OpenMP specification at www.openmp.org for a complete description of these constructs.
To summarize, most of the OpenMP offload pragmas are supported except those involving distribute construct or any parallel construct other than parallel for . For example, asynchronous offload and task dependencies using nowait and depend clauses are supported.
To enable OpenMP-based offload for processor graphics, pass the following options to the compiler:
-qopenmp -qopenmp-offload=gfx (Linux*)
/Qopenmp /Qopenmp-offload=gfx (Windows*)
A Subset of Intel® Cilk™ Plus Language Extensions: The compiler supports the _Cilk_for, _Thread_group and__thread_group_local keywords to specify parallelism in offloaded code and thread group local data, as well as #pragma offload syntax to enable offload.
Intel® Cilk™ Plus is a deprecated feature in the Intel® C++ Compiler 18.0. An alternative for offloading to the processor graphics is planned for a future release. For more information see Migrate Your Application to use OpenMP* or Intel® TBB Instead of Intel® Cilk™ Plus.
API-based Offload: In this model, you don't use any special syntax for initiating an offload, such as #pragma offload or #pragma omp target. Instead, you express data setup and offload using special API functions implemented in the Intel® Graphics Technology offload runtime.
One important characteristic of a programming model is whether it supports synchronous or asynchronous offload or both. With synchronous offload, the CPU thread that initiates an offload always waits for the offload to complete before proceeding. The offload operation includes sharing and un-sharing data. In many cases this approach leads to inefficiency, for example, the CPU thread that initiates the offload operation could do something useful instead of waiting for the offload to complete, or the same data could be shared once and used in multiple offloads. In such cases, you should use asynchronous offload. Of these three programming models, the OpenMP-based and Intel® Cilk™ Plus-based models support both synchronous and asynchronous modes, whereas the API-based model offload is purely asynchronous.
The compiler also provides language extensions to facilitate programming for Intel® Graphics Technology, including:
The offload and offload_attribute pragmas, where the parallel region is a _Cilk_for loop or array notation statement.
Predefined macros you can use when programming for Intel® Graphics Technology.
Attributes to place variables and functions on the target.
Built-in functions specifically supporting heterogeneous programming for Intel® Graphics Technology, as well as support for many existing CPU intrinsics.
APIs for synchronous and asynchronous offloading to facilitate the organization of queued offloading of direct kernel functions and data sharing between the CPU and processor graphics.
There are two modes for sharing memory the CPU and the processor graphics:
The default programming model: Allows sharing physical memory between the CPU and the processor graphics, but not virtual memory.
Shared Virtual Memory (SVM): Allows sharing virtual address space between the CPU and the processor graphics.
The section in this document on Shared Virtual Memory explains some differences in programming necessary to use SVM mode.
The compiler provides the following compiler options and environment variables that you can use when building a binary for Intel® Graphics Technology:
Compiler Option |
Description |
---|---|
qoffload, Qoffload |
Specifies the mode for offloading. The negative form of this option ignores language constructs for offloading. |
qoffload-attribute-target, Qoffload-attribute-target |
Flags every global routine and global data object in the source file with the offload attribute target(gfx). |
qoffload-option, Qoffload-option |
Specifies options to be used for the specified target and tool. |
qoffload-arch, Qoffload-arch |
Specifies the target architecture to use when offloading code. |
mgpu-asm-dump, Qgpu-asm-dump |
Generates a native assembly listing for the processor graphics code to be offloaded. |
mgpu-arch, Qgpu-arch |
Builds the offload code for graphics to run on a particular graphics processor as specified by the option value. |
qopenmp, Qopenmp |
Enables the parallelizer to generate multi-threaded code based on OpenMP* directives. |
qopenmp-offload, Qopenmp-offload |
Enables or disables OpenMP* offloading compilation for the target pragmas |
The following environment variables are only a few of the available environment variables for Intel® Graphics Technology:
Environment Variable |
Description |
---|---|
GFX_CPU_BACKUP |
Controls whether heterogeneous code is executed on the host when the target is not available. |
GFX_MAX_THREAD_COUNT |
Controls the maximum number of target threads to parallelize loop nests. |
GFX_OFFLOAD_TIMEOUT |
Controls execution time of offload tasks. |
GFX_SHOW_TIME |
Controls printing of timing information at the end of execution. |