Vectorization Basics for IntelŪ Architecture Processors

IntelŪ Architecture Processors provide performance acceleration using Single Instruction Multiple Data (SIMD) instruction sets, which include:

By processing multiple data elements in a single instruction, these ISA extensions enable data parallelism.

When using SIMD instructions, vector registers can store a group of data elements of the same data type, such as float or char. The number of data elements that fit in one register depends on the microarchitecture and on the data type width: for example, in case CPU supports vector register width 512 bits, each vector (ZMM) register can store sixteen float numbers, sixteen 32-bit integer numbers, and so on.

When using the SPMD technique, the IntelŪ OpenCL™ implementation can map the work items to the hardware according to one of the following:

The IntelŪ SDK for OpenCL™ Applications contains an implicit vectorization module, which implements the second method. Depending on the kernel code, this operation might have some limitations. If the vectorization module optimization is disabled, the Intel SDK for OpenCL Applications uses the first method.

See Also

Vectorization: SIMD Processing Within a Work Group
Benefitting From Implicit Vectorization