Developer Guide for Intel® Data Analytics Acceleration Library 2019 Update 2

Softmax Backward Layer

For any x i 1...i p from X R n 1 x ... x n p and for dimension k of size n k , the softmax activation layer applies the transform defined as

The softmax function is known as the normalized exponential (see [Bishop2006] for exact definitions of softmax).

The backward softmax layer for dimension k of size n k computes the value:

where g i 1...i p is the input gradient computed on the preceding layer.

Problem Statement

Given p-dimensional tensors of size n 1 x n 2 x ... x n p :

The problem is to compute the p-dimensional tensor Z = (z i 1...i p ) of size n 1 x n 2 x ... x n p such that: