IntelŪ Architecture Processors provide performance acceleration using Single Instruction Multiple Data (SIMD) instruction sets, which include:
By processing multiple data elements in a single instruction, these ISA extensions enable data parallelism.
When using SIMD instructions, vector registers can store a group of
data elements of the same data type, such as float
or char
.
The number of data elements that fit in one register depends on the microarchitecture
and on the data type width: for example, in case CPU supports vector register
width 512 bits, each vector (ZMM) register can store sixteen float numbers,
sixteen 32-bit integer numbers, and so on.
When using the SPMD technique, the IntelŪ OpenCL implementation can map the work items to the hardware according to one of the following:
The IntelŪ SDK for OpenCL Applications contains an implicit vectorization module, which implements the second method. Depending on the kernel code, this operation might have some limitations. If the vectorization module optimization is disabled, the Intel SDK for OpenCL Applications uses the first method.