Many image-processing kernels operate on uchar
input. To
avoid overflows, those eight-bit input values are typically converted
and processed as 16- or 32-bit integer
values. Use signed
data types (shorts
and ints
) in both cases to
enable the compiler to utilize a larger set of SIMD instructions.
Using size_t
, which is another unsigned type for indices,
makes the vectorization of indexing arithmetic less efficient. To improve
performance, use the int
data type for work-item parameters
and loop counter, when your index fits the 32-bit integer range. Consider
the following example:
__kernel void foo(__constant int* data, const uint workPerItem) { int tid = get_global_id(0); int gridSize = get_global_size(0); for (int i = myStart; i <mystart += workperitem; ++i) …
Also when the compiler generates the scatter or gather instructions
on non-consecutive memory accesses, it needs to safely cast to the int32
since gather and scatter instructions use the int32
indices.
Explicit casting of the indices to the int32
in a kernel
simplifies the compiler job.