Developer Guide for Intel® Data Analytics Acceleration Library 2018

Batch Processing

Gradient boosted trees classification and regression follows the general workflow described inUsage Model: Training and Prediction .

Training

For description of the input and output, refer to Usage Model: Training and Prediction.

At the training stage, the gradient boosted trees batch algorithm has the following parameters:

Parameter

Default Value

Description

splitMethod

exact

Split computation mode.

Possible values:

  • exact - all possible splits for a given feature are examined

maxIterations

50

Maximal number of iterations when training the model, defines maximal number of trees in the model.

maxTreeDepth

6

Maximal tree depth. If the parameter is set to 0 then the depth is unlimited.

shrinkage

0.3

Learning rate of the boosting procedure. Scales the contribution of each tree by a factor (0, 1]

minSplitLoss

0

Loss regularization parameter. Minimal loss reduction required to make a further partition on a leaf node of the tree. Range: [0,∞)

lambda

1

L2 regularization parameter on weights. Range: [0, ∞)

observationsPerTreeFraction

1

Fraction of the training set S used for a single tree training, 0 < observationsPerTreeFraction ≤ 1. The observations are sampled randomly without replacement.

featuresPerNode

0

Number of features tried as the possible splits per node. If the parameter is set to 0, all features are used.

minObservationsInLeafNode

5

Minimal number of observations in the leaf node.

memorySavingMode

false

If true then use memory saving (but slower) mode.

engine

SharePtr< engines:: mt19937:: Batch>()

Pointer to the random number generator.