Use Floating Point for Calculations

Intel® Xeon® processors significantly accelerate floating-point calculations on the device.

Consider the following code snippet that performs calculations in int:

__kernel void scale (__constant uchar* srcA, __constant uchar* srcB, __constant uchar nSaturation, __global uchar* dst)
        int offset = get_global_id();
        uint tempSrcA = convert_uint(srcA[offset]);//Load one RGBA8 pixel
        uint tempSrcB = convert_uint(srcB[offset]);//Load one RGBA8 pixel
        //some processing
        uint tempDst = (tempSrcA - tempSrcB) * nSaturation;
        //store 
        dst[offset] = convert_uchar(tempDst);
}

The following example uses the float equivalent:

__kernel void scale (__constant uchar* srcA, __constant uchar* srcB, __constant uchar nSaturation, __global uchar* dst)
        int offset = get_global_id();
        float tempSrcA = convert_float(srcA[offset]);//Load one RGBA8 pixel
        float tempSrcB = convert_float(srcB[offset]);//Load one RGBA8 pixel
        //some processing
        float tempDst = (tempSrcA - tempSrcB) * nSaturation;
        //store 
        dst[offset] = convert_uchar(tempDst);
}

Using built-in functions improves performance. See the Use Built-In Functions section for more information.

NOTE: The compiler is capable of automatic fusion of multiplies and adds. Use the -cl-mad-enable compiler flag to enable this optimization when compiling. Still, using explicit "mad" built-in ensures that the built-in is mapped directly to the efficient instruction.