Several environment variables and one attribute extension are related to vectorizer.
This variable can be set to False and True respectively. Notice that just like any other environment variables this one affects the behavior of the vectorizer of the entire system (or shell instances) until variable gets unset explicitly (or shell(s) terminates).
This variable affects code generation for CPU OpenCL device only. It
effectively sets the vectorization “width” (when CL_CONFIG_USE_VECTORIZER
= True
):
CL_CONFIG_CPU_VECTORIZER_MODE = 0
(default). The compiler
makes heuristic decisions whether to vectorize each kernel, and if
so which vector width to use.CL_CONFIG_CPU_VECTORIZER_MODE = 1
. No vectorization
by compiler. Explicit vector data types in kernels are left intact.
This mode is the same as CL_CONFIG_USE_VECTORIZER = False
.CL_CONFIG_CPU_VECTORIZER_MODE = 4
. Disables heuristic
and vectorizes to the width of 4.CL_CONFIG_CPU_VECTORIZER_MODE = 8
. Disables heuristic
and vectorizes to the width of 8.CL_CONFIG_CPU_VECTORIZER_MODE = 16
. Disables heuristic
and vectorizes to the width of 16.
The goal of this extension is to allow programmers to specify vector length the kernel should be vectorized to. This information may be used by the compiler to generate more optimal code.
For a device that supports the extension, the function clGetDeviceInfo
with the parameter CL_DEVICE_EXTENSION
returns a space separated
list of extensions names that contains cl_intel_vec_len_hint
.
Use the OpenCL C optional attribute qualifier __attribute__((intel_vec_len_hint(<int>)))
,
where <int>
is a vector length that the kernel should
be vectorized to.
You can set one the following value to the variable:
0
- the compiler makes heuristic decisions whether
to vectorize each kernel, and if so which vector length to use.1
- no vectorization by compiler. Explicit vector
data types in kernels are left intact.4
- disables heuristic and vectorizes to the length
of 4.8
- disables heuristic and vectorizes to the length
of 8.16
- disables heuristic and vectorizes to the length
of 16.
Note
If the work group size is not evenly divisible by the specified vector length hint, loop remainder might not be executed in vector code iterations.
Note
If you specify simultaneously the intel_vec_len_hint
and vec_type_hint
attributes, the compiler ignores vec_type_hint
attribute.
The examples below illustrate valid and invalid uses of the extension:
__attribute__((intel_vec_len_hint(8))) __kernel void kr1(…) { … }
intel_vec_len_hint
and vec_type_hint
.
In the example, the compiler ignores vec_type_hint
and
vectorize to the length of 4.
__attribute__((intel_vec_len_hint(4))) __attribute__((vec_type_hint (float8))) __kernel void kr2(…) { … }