Non-uniform control flow might introduce a significant performance overhead if performed within dimension of vectorization.
Dimension is chosen by the compiler based on a heuristic model.
A branch is uniform if it is statically guaranteed that all work-items within an NDRange execute the same side of the branch.
Uniform branch example:
//isSimple is a kernel argument (and thus constant for all work-items of the NDRange) int LID = get_local_id(0); if (isSimple == 0) { Res = buff[LID]*2 + 4; } else { Res = buff[LID]/2 - 4; }
Uniform branches of this kind can be often replaced with compile time branches.
Non-uniform branch example:
int LID=get_local_id(0); if (LID == 0) { Res = buff[LID]*2 + 4; } else { Res = buff[LID]/2 - 4; }
During vectorization of this example, if performed on dimension zero, the compiler has to generate the code that executes both sides of the branch and merges the result with a mask afterwards.
The control-flow based on expressions with id on the second dimension (in case of two-dimensional NDRange) is actually uniform for dimension zero and of no concern.