As the OpenCL™ runtime calls an issued kernel many times, optimizing the kernel can bring performance gains. If you move a piece of code out of the innermost loop in a typical native code, move it from the kernel as well. Examples of code types that you might move out of the loop: