Intel® Math Kernel Library 2019 Developer Reference - C
Computes a matrix-matrix product with general integer matrices (where one or both input matrices are stored in a packed data structure) and adds the result to a scalar-matrix product.
void cblas_gemm_s8u8s32_compute(const CBLAS_LAYOUT Layout, const MKL_INT transa, const MKL_INT transb, const CBLAS_OFFSET offsetc, const MKL_INT m, const MKL_INT n, const MKL_INT k, const float alpha, const void *a, const MKL_INT lda, const MKL_INT8 oa, const void *b, const MKL_INT ldb, const MKL_INT8 ob, const float beta, MKL_INT32 *c, const MKL_INT ldc, const MKL_INT32 *oc);
void cblas_gemm_s16s16s32_compute(const CBLAS_LAYOUT Layout, const MKL_INT transa, const MKL_INT transb, const CBLAS_OFFSET offsetc, const MKL_INT m, const MKL_INT n, const MKL_INT k, const float alpha, const MKL_INT16 *a, const MKL_INT lda, const MKL_INT16 oa, const MKL_INT16 *b, const MKL_INT ldb, const MKL_INT16 ob, const float beta, MKL_INT32 *c, const MKL_INT ldc, const MKL_INT32 *oc);
The cblas_gemm_*_compute routine is one of a set of related routines that enable use of an internal packed storage. After calling cblas_gemm_*_pack call cblas_gemm_*_compute to compute
C := alpha*(op(A) + A_offset)*(op(B) + B_offset) + beta*C + C_offset,
where:
You must use the same value of the Layout parameter for the entire sequence of related cblas_?gemm_pack and cblas_?gemm_compute calls.
For best performance, use the same number of threads for packing and for computing.
If you are packing for both A and B matrices, you must use the same number of threads for packing A as for packing B.
Specifies whether two-dimensional array storage is row-major (CblasRowMajor) or column-major(CblasColMajor).
MKL_INTSpecifies the form of op(A) used in the packing:
If transa = CblasNoTrans op(A) = A.
If transa = CblasTrans op(A) = AT.
If transa = CblasPacked the matrix in array a is packed into a format internal to Intel MKL and lda is ignored.
MKL_INT Specifies the form of op(B) used in the packing:
If transb = CblasNoTrans op(B) = B.
If transb = CblasTrans op(B) = BT.
If transb = CblasPacked the matrix in array b is packed into a format internal to Intel MKL and ldb is ignored.
CBLAS_OFFSET Specifies the form of C_offset used in the matrix multiplication.
If offsetc=CblasFixOffset :oc has a single element and every element of C_offset is equal to this element.
If offsetc=CblasColOffset :oc has a size of m and every element of C_offset is equal to oc.
If offsetc=CblasRowOffset :oc has a size of n and every element of C_offset is equal to oc.
MKL_INTSpecifies the number of rows of the matrix op(A) and of the matrix C. The value of m must be at least zero.
MKL_INTSpecifies the number of columns of the matrix op(B) and the number of columns of the matrix C. The value of n must be at least zero.
MKL_INTSpecifies the number of columns of the matrix op(A) and the number of rows of the matrix op(B). The value of k must be at least zero.
floatSpecifies the scalar alpha.
void* for gemm_s8u8s32_compute
MKL_INT16* for gemm_s16s16s32_compute
transa = CblasNoTrans |
transa = CblasTrans |
transa = CblasPacked |
|
Layout = CblasColMajor |
Array, size lda*k. Before entry, the leading m-by-k part of the array a must contain the matrix A. For cblas_gemm_s8u8s32_compute, the element in the a array must be an 8-bit signed integer. |
Array, size lda*m. Before entry, the leading k-by-m part of the array a must contain the matrix A. For cblas_gemm_s8u8s32_compute, the element in the a array must be an 8-bit signed integer. |
Array of size returned by cblas_gemm_*_pack_get_size and initialized using cblas_gemm_*_pack |
Layout = CblasRowMajor |
Array, size lda*m. Before entry, the leading k-by-m part of the array a must contain the matrix A. For cblas_gemm_s8u8s32_compute, the element in the a array must be an 8-bit unsigned integer. |
Array, size lda*k. Before entry, the leading m-by-k part of the array a must contain the matrix A. For cblas_gemm_s8u8s32_compute, the element in the a array must be an 8-bit unsigned integer. |
Array size returned by cblas_gemm_*_pack_get_size and initialized using cblas_gemm_*_pack |
MKL_INTSpecifies the leading dimension of a as declared in the calling (sub)program.
transa = CblasNoTrans |
transa = CblasTrans |
|
Layout = CblasColMajor |
lda must be at least max(1, m). |
lda must be at least max(1, k). |
Layout = CblasRowMajor |
lda must be at least max(1, k). |
lda must be at least max(1, m). |
MKL_INT8 for cblas_gemm_s8u8s32_compute
MKL_INT16 for cblas_gemm_s16s16s32_compute
Specifies the scalar offset value for the matrix A.
void* for gemm_s8u8s32_compute
MKL_INT16* for gemm_s16s16s32_compute
transb = CblasNoTrans |
transb = CblasTrans |
transb = CblasPacked |
|
Layout = CblasColMajor |
Array, size ldb*n. Before entry, the leading k-by-n part of the array b must contain the matrix B. For cblas_gemm_s8u8s32_compute, the element in the b array must be an 8-bit unsigned integer. |
Array, size ldb*k. Before entry, the leading n-by-k part of the array b must contain the matrix B. For cblas_gemm_s8u8s32_compute, the element in the b array must be an 8-bit unsigned integer. |
Array of size returned by cblas_gemm_*_pack_get_size and initialized using cblas_gemm_*_pack |
Layout = CblasRowMajor |
Array, sizeldb*k. Before entry, the leading n-by-k part of the array b must contain the matrix B. For cblas_gemm_s8u8s32_compute, the element in the b array must be an 8-bit signed integer. |
Array, size ldb*n. Before entry, the leading k-by-n part of the array b must contain the matrix B. For cblas_gemm_s8u8s32_compute, the element in the b array must be an 8-bit signed integer. |
Array of size returned by cblas_gemm_*_pack_get_size and initialized using cblas_gemm_*_pack |
MKL_INT Specifies the leading dimension of b as declared in the calling (sub)program.
transb = CblasNoTrans |
transb = CblasTrans |
|
Layout = CblasColMajor |
ldb must be at least max(1, k). |
ldb must be at least max(1, n). |
Layout = CblasRowMajor |
ldb must be at least max(1, n). |
ldb must be at least max(1, k). |
MKL_INT8 for cblas_gemm_s8u8s32_compute
MKL_INT16 for cblas_gemm_s16s16s32_compute
Specifies the scalar offset value for the matrix B.
float
Specifies the scalar beta.
MKL_INT32*
Array:
Layout = CblasColMajor |
Array, size ldc*n. Before entry, the leading m-by-n part of the array c must contain the matrix C, except when beta is equal to zero, in which case c need not be set on entry. |
Layout = CblasRowMajor |
Array, size ldc*m. Before entry, the leading n-by-m part of the array c must contain the matrix C, except when beta is equal to zero, in which case c need not be set on entry. |
MKL_INT Specifies the leading dimension of c as declared in the calling (sub)program.
Layout = CblasColMajor |
ldc must be at least max(1, m) |
Layout = CblasRowMajor |
ldc must be at least max(1, n) |
MKL_INT32*
Array, size len. Specifies the scalar offset value for the matrix C.
If offsetc = CblasFixOffset , len must be at least 1.
If offsetc = CblasColOffset , len must be at least max(1, m).
If offsetc = CblasRowOffset , len must be at least max(1, n).
c |
MKL_INT32* Overwritten by the matrix alpha*(op(A) + A_offset)*(op(B) + B_offset) + beta*C + C_offset. |
See the following examples in the MKL installation directory to understand the use of these routines:
cblas_gemm_s8u8s32_compute: examples\cblas\source\cblas_gemm_s8u8s32_computex.c
cblas_gemm_s16s16s32_compute: examples\cblas\source\cblas_gemm_s16s16s32_computex.c
You can expand the matrix-matrix product in this manner:
(op(A) + A_offset)*(op(B) + B_offset) = op(A)*op(B) + op(A)*B_offset + A_offset*op(B) + A_offset*B_offset
After computing these four multiplication terms separately, they are summed from left to right. The results from the matrix-matrix product and the C matrix are scaled with alpha and beta floating-point values respectively using double-precision arithmetic. Before storing the results to the output c array, the floating-point values are rounded to the nearest integers.
In the event of overflow or underflow, the results depend on the architecture. The results are either unsaturated (wrapped) or saturated to maximum or minimum representable integer values for the data type of the output matrix.
When using cblas_gemm_s8u8s32_compute with row-major layout , the data types of A and B must be swapped. That is, you must provide an 8-bit unsigned integer array for matrix A and an 8-bit signed integer array for matrix B .