OpenCL™ offers two basic ways to trade precision for speed:
native_*
and half_*
math built-ins, which
have lower precision, but are faster than their un-prefixed variants-cl-fast-relaxed-math
flag.In general, while the -cl-fast-relaxed-math
flag is a quick
way to get performance gains for kernels with many math operations, it
does not permit fine numeric accuracy control. Consider experimenting
with the native_*
equivalents separately for each specific
case, keeping track of the resulting accuracy.
Native_
versions of math built-ins are supported in hardware
and run substantially faster, while offering lower accuracy. Use native
trigonometry and transcendental functions, such as sin
, cos
,
exp
, and log
, when performance is more important
than precision.
For a full list of OpenCL build options and option descriptions, refer to the the OpenCL specification. For the instructions on how to use these options with the Intel® SDK for OpenCL™ Applications, refer to the following pages in the Developer Guide for Intel® SDK for OpenCL™ Applications: Build with OpenCL Offline Compiler Command Line Interface (for Intel® SDK for OpenCL™ Applications standalone version), Configuring OpenCL™ Build Options (for Intel® Code Builder for OpenCL™ API plugin for Microsoft Visual Studio*), Configuring Build Options (for Intel® Code Builder for OpenCL™ API plugin for Eclipse*).