Developer Guide for Intel® Data Analytics Acceleration Library 2019 Update 5
The forward batch normalization layer normalizes x i 1...i p from the input X ∈ R n 1 x n 2 x ... x n p for the dimension k ∈ {1, ... p} and then scales and shifts the result of the normalization . For more details, see Forward Batch Normalization Layer. The backward batch normalization layer [Ioffe2015] computes the values for the dimension k ∈ {1, ... p}:
where
g is the gradient of the preceding layer
E is the objective function used at the training stage.
weights
biases
mean
variance
standard deviation
Given p-dimensional tensors:
G ∈ R n 1 x n 2 x ... x n p - the gradient computed on the preceding layer
Y ∈ R n 1 x n 2 x ... x n p - the output of the forward batch normalization layer
The problem is to compute the p-dimensional tensor Z ∈ R n 1 x n 2 x ... x n p such that:
for j = 1, ..., n k , where: