Developer Guide for Intel® Data Analytics Acceleration Library 2018 Update 2
Given n feature vectors X = { x 1 = (x 11,..., x 1p ),..., x n = (x n1,...,x np ) } of n p-dimensional feature vectors and n responses Y = {y 1,...,y n }, the problem is to build a gradient boosted trees classification or regression model.
The tree ensemble model uses M additive functions to predict the output
where
is the space of regression trees,
T is the number of leaves in the tree,
w is a leaf weights vector,
w
i
is a score on
i-th leaf.
q(x)represents the structure of each tree that maps an observation to the corresponding leaf index.
Training procedure is an iterative functional gradient descent algorithm which minimizes objective function over function space by iteratively choosing a function (regression tree) that points in the negative gradient direction. The objective function is
where
is twice differentiable convex loss function and
is a regularization term that penalizes the complexity of the model defined by the number of leaves T and the L2 norm of the weights
for each tree,
γ and
λ are regularization parameters.
Library uses the second-order approximation of objective function
, where
,
and following algorithmic framework for the training stage.
Let
S = (X,
Y) be the set of observations. Given the training parameters, such as the number of iterations
M, loss function
, regression tree training parameters, regularization parameters
γ and
λ, shrinkage (learning rate) parameter
θ, the algorithm does the following:
The algorithm for growing the tree:
For more details, see [Chen2016].
The library supports the following termination criteria when growing the tree:
Given a gradient boosted trees model and vectors x 1,..., x r , the problem is to calculate the responses for those vectors. To solve the problem for each given query vector x i , the algorithm finds the leaf node in a tree in the ensemble which gives the response by that tree. Resulting response is based on an aggregation of responses from all trees in the ensemble. For detailed definition, see description of a specific algorithm.
The library supports two split calculation modes: