Developer Guide for Intel® Data Analytics Acceleration Library 2019 Update 5
Cross-entropy loss is an objective function minimized in the process of logistic regression training when a dependent variable takes more than two values.
Given
n feature vectors X = { x 1 = (x 11 ,…,x 1p ), ..., x n = (x n 1 ,…,x n p ) } of n p-dimensional feature vectors , a vector of class labels y = (y
1,…,y
n
) , where
y
i
∈ {0, T-1} describes the class, to which the feature vector
x
i
belongs, where
T is the number of classes, optimization solver optimizes cross-entropy loss objective function by argument θ, it is a matrix of size T × (p + 1). The cross entropy loss objective function K(θ, X, y) has the following format
where:
For a given set of indices
, the value and the gradient of the sum of functions in the argument X respectively have the format:
where
,
Hessian matrix is a symmetric matrix of size
SxS, where
, where
- learning rate.
For more details, see [Hastie2009]
Implementation of pt(z,θ) computation relies on the numerically stable version of the softmax function (Analysis > Math functions > Softmax).