Developer Guide for Intel® Data Analytics Acceleration Library 2019 Update 5
The forward two-dimensional (2D) spatial pyramid pooling layer with pyramid height L ∈ N is a form of non-linear downsampling of an input tensor X . The library supports four-dimensional input tensors X ∈ R n 1 x n 2 x n 3 x n 4 . 2D spatial pyramid pooling partitions the input tensor data into (2 l )2 subtensors/bins, l ∈ {0, ..., L-1}, along dimensions k 1 and k 2 and computes the result in each subtensor. The computation is done according to the selected pooling strategy: maximum, average, or stochastic. The spatial pyramid pooling layer applies the pooling L times with different kernel sizes, strides, and paddings.
The library provides several spatial pyramid pooling layers:
The following description applies to each of these layers.
Let X ∈ R n 1 x n 2 x n 3 x n 4 be the tensor of input data and k 1 and k 2 be the dimensions along which kernels are applied. Without loss of generality k 1 and k 2 are the last dimensions of the tensor X . For each level l ∈ {0, ..., L-1} and number of bins b = 2 l , the layer applies 2D pooling with parameters:
Kernel sizes
Strides s i = m i
Paddings
In the layout flattened along the dimension
n', the layer result is represented as a two-dimensional tensor
Y
∈
R
n
1 x
n'
, where
.
The following figure illustrates the behavior of the spatial pyramid maximum pooling forward layer with pyramid height
L = 2: