Note on the Non-Uniform Control Flow remove

Non-uniform control flow might introduce a significant performance overhead if performed within dimension of vectorization.

Dimension is chosen by the compiler based on a heuristic model.

A branch is uniform if it is statically guaranteed that all work-items within an NDRange execute the same side of the branch.

Uniform branch example:

//isSimple is a kernel argument (and thus constant for all work-items of the NDRange)
int LID = get_local_id(0);
if (isSimple == 0)
{
        Res = buff[LID]*2 + 4;
}
else
{
        Res = buff[LID]/2 - 4;
}

Uniform branches of this kind can be often replaced with compile time branches.

Non-uniform branch example:

int LID=get_local_id(0);
if (LID == 0)
{
        Res = buff[LID]*2 + 4;
}
else
{
        Res = buff[LID]/2 - 4;
}

During vectorization of this example, if performed on dimension zero, the compiler has to generate the code that executes both sides of the branch and merges the result with a mask afterwards.

The control-flow based on expressions with id on the second dimension (in case of two-dimensional NDRange) is actually uniform for dimension zero and of no concern.