When comparing an OpenCL™ kernel performance on CPU device with native code performance, make sure that both versions of code are as similar as possible. Consider the following guidelines:
clCreateProgramFromBinary
call.rsqrt(x)
is inherently of the higher accuracy than __mm_rsqrt_ps
SSE intrinsic. To use the same accuracy in native code and OpenCL
code, do one of the following:
__mm_rsqrt_ps
in your native code with couple
of additional Newton-Raphson iterations to match the precision
of OpenCL™ rsqrt
.native_rsqrt
in your OpenCL™ kernel, which
maps exactly to the rsqrtps
instruction in the final
assembly code.rsqrt
, you can
use the relaxed versions of rcp
, sqrt
,
and so on. Refer to the Developer Guide for
Intel® SDK for OpenCL™ Applications for the full list of supported
functions.