Developer Guide for Intel® Data Analytics Acceleration Library 2018
The forward three-dimensional (3D) max pooling layer is a form of non-linear downsampling of an input tensor X ∈ R n 1 x n 2 x ... x n p . 3D max pooling partitions the input tensor data into 3D subtensors along dimensions k 1, k 2, and k 3, selects an element with the maximal numeric value in each subtensor, and transforms the input tensor to the output tensor Y by replacing each subtensor with its maximum element.
Given:
p-dimensional tensor X ∈ R n 1 x n 2 x ... x n p with input data.
Dimensions k 1, k 2, and k 3 along which the kernel is applied
Kernel sizes
m
1, m2, and
m
3:
where
p
1,
p
2 and
p
3 are paddings
The problem is to compute the value tensor Y = (y i 1 ...i p ) ∈ R l 1 x ... x l p using the downsampling technique.
The layer computes the value
y
i
1
...i
p
as the maximum element in the subtensor. After the kernel is applied to the
subtensor at position
the index of the maximum
T =
(t
i
1
...i
p
)
is stored for use by the backward 3D max pooling layer:
s 1, s 2, and s 3 are strides