Using Vector Data Types

To maximize use of vector CPUs, consider using vector data types in your kernel code as a more involved performance alternative to the automatic (compiler-aided) vectorization described in the Benefitting from Implicit Vectorization section. This technique enables you to map vector data types directly to the hardware vector registers. Thus, the used data types should match the width of the underlying SIMD instructions.

Consider the following recommendations:

NOTE: The int8 data type improves performance only starting the 4th Generation Intel® Core™ processors.

Using vector data types, you plan the vector-level parallelism yourself instead of relying on the implicit vectorization module. See the Benefitting from Implicit Vectorization section for more information.

This approach is useful in the following scenarios:

The following example demonstrates the multiplication kernel that targets the 256-bit vector units of the 2nd Generation Intel Core processors and higher:

__kernel __attribute__((vec_type_hint(float8)))
void edp_mul(__global const float8 *a,
                      __global const float8 *b,
                      __global float8 *result)
{
  int id = get_global_id(0);
  result[id] = a[id]* b[id];  
}

In this example, the data passed to the kernel represents buffers of float8. The calculations are performed on eight elements together.

The attribute added before the kernel, signals the compiler, or the implementation that this kernel has an optimized vectorized form, so the implicit vectorization module does not operate on it. Use vec_type_hint to hint compiler that your kernel already processes data using mostly vector types. For more details on this attribute, see the section 6.7.2 of the OpenCL™ 1.2 specification at https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf.

See Also

Writing Kernels to Directly Target the Intel Architecture Vector Processors
OpenCL 1.2 Specification at https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf