The Intel Math Functions (IMF) Device Library is a set of standard math functions implemented for execution on SYCL devices (GPU, CPU, and accelerators). Most of IMF functions comply with ISO C99, SYCL, OpenCL™, IEEE754 standards in terms of computed outputs and IEEE754-special values processing.
The function interfaces are available through the header file:
#include <sycl/ext/intel/math.hpp>
All IMF device library functions provide following accuracy levels:
All functions provide the default accuracy implementations. A subset of functions contains additional ha, la, and ep accuracy flavors for possible accuracy-performance balance in specific use cases.
The extended accuracy versions are available with the nested namespaces: ha, la, and ep. For example:
float sycl::ext::intel::math::acos ( float x ); // default accuracy float sycl::ext::intel::math::ha::acos ( float x ); // ha (High Accuracy) float sycl::ext::intel::math::la::acos ( float x ); // la (Low Accuracy) float sycl::ext::intel::math::ep::acos ( float x ); // ep (Enhanced Performance)
The following table shows math function accuracies. The accuracy is measured in ULP's on uniformly distributed random input values along commonly used function-specific work intervals with the addition of:
PRECISION | fp64 (double) | fp32 (float) | fp16 (sylc::half) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
ACCURACY | default | ha | la | ep | default | ha | la | ep | default | ha | la | ep |
acos | 0.79 | 0.79 | 2.27 | 4.0E+07 | 3.0 | 0.78 | 3.0 | 525.0 | ||||
asin | 0.72 | 0.72 | 2.61 | 4.1E+07 | 3.73 | 0.69 | 3.73 | 535.0 | ||||
atan | 0.65 | 0.65 | 2.14 | 2.2E+07 | 0.87 | 0.87 | 3.05 | 2.2E+03 | ||||
atan2 | 0.76 | 0.76 | 2.31 | 2.2E+07 | 2.65 | 0.87 | 2.65 | 436 | ||||
acosh | 1.37 | 0.89 | 1.37 | 1.39 | 0.86 | 1.39 | 1.6E+03 | |||||
asinh | 1.6 | 0.62 | 1.6 | 1.58 | 0.68 | 1.58 | 1.6E+03 | |||||
atanh | 2.12 | 0.65 | 2.12 | 1.85 | 0.56 | 1.85 | 1.5E+03 | |||||
ceil | 0.0 | 0.0 | ||||||||||
cbrt | 0.73 | 0.79 | ||||||||||
copysign | 0.0 | 0.0 | ||||||||||
cdfnorm*** | 1.0 | 1.12 | ||||||||||
cdfnorminv*** | 2.0 | 3.46 | ||||||||||
cos | 0.85 | 0.85 | 3.23 | 6.1E+07 | 1.79 | 0.64 | 1.79 | 2.5E+03 | 1.43 | |||
cosh | 0.75 | 0.75 | 1.42 | 1.99 | 0.56 | 1.99 | 380.0 | |||||
cospi | 1.0 | 1.78 | ||||||||||
erf | 0.82 | 0.82 | 2.07 | 7.03 | 0.90 | 0.90 | 2.16 | 6.33 | ||||
erfc | 2.92 | 0.75 | 2.92 | 2.72 | 0.76 | 2.72 | ||||||
erfcinv | 1.0 | 3.15 | ||||||||||
erfcx | 2.0 | 2.34 | ||||||||||
erfinv | 1.41 | 1.0 | ||||||||||
exp10 | 1.0 | 0.51 | 1.00 | 2.8E+07 | 0.93 | 0.93 | ||||||
exp2 | 0.71 | 0.71 | 1.07 | 6.0E+04 | 0.68 | 0.68 | 1.66 | |||||
exp | 0.92 | 0.92 | 1.25 | 1.7E+07 | 0.82 | 0.82 | 1.61 | 0.83 | 1.61 | |||
expm1 | 0.75 | 0.75 | 1.76 | 1.1E+07 | 0.74 | 0.74 | 1.69 | 328.0 | ||||
fdim | 0.0 | 0.0 | ||||||||||
floor | 0.0 | 0.0 | ||||||||||
fmod | 0.0 | 0.0 | ||||||||||
frexp | 0.0 | 0.0 | ||||||||||
hypot | 1.12 | 0.85 | 1.12 | 0.96 | 0.5 | 0.96 | ||||||
cyl_bessel_i0 | 1.36 | 5.21 | ||||||||||
cyl_bessel_i1 | 2.77 | 5.69 | ||||||||||
j0 | 3.81 | 2.78 | ||||||||||
j1 | 3.01 | 2.38 | ||||||||||
jn | 2.7E+03 | 8.0E+01 | ||||||||||
lgamma | 3.52 | 2.99 | ||||||||||
ilogb | 0.0 | 0.0 | ||||||||||
isfinite | 0.0 | 0.0 | ||||||||||
isinf | 0.0 | 0.0 | ||||||||||
isnan | 0.0 | 0.0 | ||||||||||
ldexp | 0.0 | 0.0 | ||||||||||
llrint | 0.0 | 0.0 | ||||||||||
llround | 0.0 | 0.0 | ||||||||||
log | 0.5 | 0.5 | 1.35 | 4.0E+07 | 0.94 | 0.94 | 1.14 | 1.5E+03 | 0.59 | |||
log10 | 0.5 | 0.5 | 1.9 | 1.58 | 0.72 | 1.58 | 989.0 | 0.58 | ||||
log1p | 0.77 | 0.77 | 1.6 | 0.55 | 0.55 | 1.73 | 1.6E+03 | |||||
log2 | 0.5 | 0.5 | 1.58 | 0.71 | 0.71 | 1.93 | 889.0 | 0.6 | ||||
logb | 0.0 | 0.0 | ||||||||||
lrint | 0.0 | 0.0 | ||||||||||
lround | 0.0 | 0.0 | ||||||||||
modf | 0.0 | 0.0 | ||||||||||
nan | 0.0 | 0.0 | ||||||||||
nearbyint | 0.0 | 0.0 | ||||||||||
nextafter | 0.0 | 0.0 | ||||||||||
norm | 1.31 | 1.46 | ||||||||||
norm3d | 0.5 | 1.04 | ||||||||||
norm4d | 0.5 | 1.09 | ||||||||||
pow | 0.98 | 0.85 | 0.98 | 1.05 | 0.78 | 1.05 | 1.8E+03 | |||||
powi | 1.48 | 18.4 | ||||||||||
rcbrt | 0.53 | 0.85 | ||||||||||
remainder | 0.0 | 0.0 | ||||||||||
remquo | 0.0 | 0.0 | ||||||||||
rhypot | 0.75 | 1.36 | ||||||||||
rint | 0.0 | 0.0 | ||||||||||
rnorm | 2.2 | 1.66 | ||||||||||
rnorm3d | 0.74 | 1.24 | ||||||||||
rnorm4d | 0.75 | 1.26 | ||||||||||
round | 0.0 | 0.0 | ||||||||||
saturate | 0.0 | |||||||||||
scalbn | 0.0 | 0.0 | ||||||||||
signbit | 0.0 | 0.0 | ||||||||||
sin | 0.85 | 0.85 | 3.15 | 6.1E+07 | 1.96 | 0.65 | 1.96 | 2.5E+03 | 1.88 | |||
sincos | 1.49 | 0.85 | 1.49 | 2.8E+07 | 2.38 | 0.86 | 2.38 | |||||
sincospi | 2.0 | 1.78 | ||||||||||
sinh | 1.74 | 0.79 | 1.74 | 1.34 | 0.68 | 1.34 | 1.1E+03 | |||||
sinpi | 1.0 | 1.78 | ||||||||||
tan | 0.52 | 0.52 | 3.01 | 5.2E+07 | 3.88 | 0.76 | 3.88 | |||||
tanh | 0.65 | 0.65 | 2.11 | 0.57 | 0.57 | 1.36 | 1.5E+03 | |||||
tgamma | 9.06 | 3.01 | ||||||||||
trunc | 0.0 | 0.0 | ||||||||||
y0 | 5.47 | 3.2 | ||||||||||
y1 | 3.64 | 4.86 | ||||||||||
yn | 2.0E+03 | 145.0 |
The accuracy of the inlined functions: inv, sqrt and rsqrt is defined by the OpenCL™/SYCL standards and may be affected by -f[no-]fast-math compiler switch.
The obtained ULP ranges are obtained via random sampling over large number of data points. The actual ULP value might be higher for specific values of arguments.
The cdfnorm and cdfnorminv have CUDA-specific aliases: normcdf and normcdfinv, which are mapped to the same computation kernels.