Intel® Math Kernel Library 2018 Developer Reference - C
Computes a matrix-matrix product with general integer matrices.
void cblas_gemm_s8u8s32 (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const CBLAS_TRANSPOSE transb, const CBLAS_OFFSET offsetc, const MKL_INT m, const MKL_INT n, const MKL_INT k, const float alpha, const void *a, const MKL_INT lda, const MKL_INT8 oa, const void *b, const MKL_INT ldb, const MKL_INT8 ob, const float beta, MKL_INT32 *c, const MKL_INT ldc, const MKL_INT32 *oc);
void cblas_gemm_s16s16s32 (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const CBLAS_TRANSPOSE transb, const CBLAS_OFFSET offsetc, const MKL_INT m, const MKL_INT n, const MKL_INT k, const float alpha, const MKL_INT16 *a, const MKL_INT lda, const MKL_INT16 oa, const MKL_INT16 *b, const MKL_INT ldb, const MKL_INT16 ob, const float beta, MKL_INT32 *c, const MKL_INT ldc, const MKL_INT32 *oc);
The cblas_gemm_* routines compute a scalar-matrix-matrix product and adds the result to a scalar-matrix product. To get the final result, a vector is added to each row or column of the output matrix. The operation is defined as:
C := alpha*(op(A) + A_offset)*(op(B) + B_offset) + beta*C + C_offset
where:
op(X) is either op(X) = X, or op(X) = XT
and such that:
alpha and beta are scalars,
A is a matrix such that op(A) is m-by-k ,
B is a matrix such that op(B) is k-by-n ,
and C is an m-by-n matrix.
C := alpha*(op(A) + A_offset)*(op(B) + B_offset) + beta*C + C_offset,
where
Specifies whether two-dimensional array storage is row-major (CblasRowMajor) or column-major (CblasColMajor).
Specifies the form of op(A) used in the matrix multiplication:
if transa=CblasNoTrans, then op(A) = A;
if transa=CblasTrans, then op(A) = AT.
Specifies the form of op(B) used in the matrix multiplication:
if transb=CblasNoTrans, then op(B) = B;
if transb=CblasTrans, then op(B) = BT.
Specifies the form of C_offset used in the matrix multiplication.
Specifies the number of rows of the matrix op(A) and of the matrix C. The value of m must be at least zero.
Specifies the number of columns of the matrix op(B) and the number of columns of the matrix C. The value of n must be at least zero.
Specifies the number of columns of the matrix op(A) and the number of rows of the matrix op(B). The value of k must be at least zero.
. Specifies the scalar alpha.
transa=CblasNoTrans |
transa=CblasTrans |
|
Layout = CblasColMajor |
Array, size lda*k signed integers. Before entry, the leading m-by-k part of the array a must contain the matrix A. |
Array, size lda*m signed integers. Before entry, the leading k-by-m part of the array a must contain the matrix A. |
Layout = CblasRowMajor |
Array, size lda* m unsigned integers. Before entry, the leading k-by-m part of the array a must contain the matrix A. |
Array, size lda*k unsigned integers. Before entry, the leading m-by-k part of the array a must contain the matrix A. |
Specifies the leading dimension of a as declared in the calling (sub)program.
transa=CblasNoTrans |
transa=CblasTrans |
|
Layout = CblasColMajor |
lda must be at least max(1, m). |
lda must be at least max(1, k). |
Layout = CblasRowMajor |
lda must be at least max(1, k). |
lda must be at least max(1, m). |
Specifies the scalar offset value for matrix A.
transb=CblasNoTrans |
transb=CblasTrans |
|
Layout = CblasColMajor |
Array, size ldb by n unsigned integers. Before entry, the leading k-by-n part of the array b must contain the matrix B. |
Array, size ldb by k unsigned integers. Before entry the leading n-by-k part of the array b must contain the matrix B. |
Layout = CblasRowMajor |
Array, size ldb by k signed integers. Before entry the leading n-by-k part of the array b must contain the matrix B. |
Array, size ldb by n signed integers. Before entry, the leading k-by-n part of the array b must contain the matrix B. |
Specifies the leading dimension of b as declared in the calling (sub)program.
transb=CblasNoTrans |
transb=CblasTrans |
|
Layout = CblasColMajor |
ldb must be at least max(1, k). |
ldb must be at least max(1, n). |
Layout = CblasRowMajor |
ldb must be at least max(1, n). |
ldb must be at least max(1, k). |
Specifies the scalar offset value for matrix B.
Specifies the scalar beta. When beta is equal to zero, then c need not be set on input.
Layout = CblasColMajor |
Array, size ldc by n. Before entry, the leading m-by-n part of the array c must contain the matrix C, except when beta is equal to zero, in which case c need not be set on entry. |
|
Layout = CblasRowMajor |
Array, size ldc by m. Before entry, the leading n-by-m part of the array c must contain the matrix C, except when beta is equal to zero, in which case c need not be set on entry. |
Specifies the leading dimension of c as declared in the calling (sub)program.
Layout = CblasColMajor |
ldc must be at least max(1, m). |
|
Layout = CblasRowMajor |
ldc must be at least max(1, n). |
Array, size len. Specifies the offset values for matrix C.
c |
Overwritten by alpha*(op(A) + A_offset)*(op(B) + B_offset) + beta*C+ C_offset. |
For examples of routine usage, see the code in in the following links and in the Intel MKL installation directory:
cblas_gemm_s8u8s32: cblas_gemm_s8u8s32x.c
cblas_gemm_s16s16s32: cblas_gemm_s16s16s32x.c
The matrix-matrix product can be expanded:
(op(A) + A_offset)*(op(B) + B_offset)
= op(A)*op(B) + op(A)*B_offset + A_offset*op(B) + A_offset*B_offset
After computing these four multiplication terms separately, they are summed from left to right. The results from the matrix-matrix product and the C matrix are scaled with alpha and beta floating-point values respectively using double-precision arithmetic. Before storing the results to the output c array, the floating-point values are rounded to the nearest integers. In the event of overflow or underflow, the results are saturated to maximum or minimum representable integer values for the data type of the output matrix.