Developer Guide for Intel® Data Analytics Acceleration Library 2019 Update 1
Decision forest classification and regression follows the general workflow described in Training and Prediction > Classification > Usage Model.
For the description of the input and output, refer to Training and Prediction > Classification > Usage Model.
At the training stage, decision tree regression has the following parameters:
Parameter |
Default Value |
Description |
|
---|---|---|---|
seed |
777 |
The seed for random number generator, which is used to choose the bootstrap set, split features in every split node in a tree, and generate permutation required in computations of MDA variable importance. |
|
nTrees |
100 |
The number of trees in the forest. |
|
observationsPerTree Fraction |
1 |
Fraction of the training set S used to form the bootstrap set for a single tree training, 0 < observationsPerTreeFraction <= 1. The observations are sampled randomly with replacement. |
|
featuresPerNode |
0 |
The number of features tried as possible splits per node. If the parameter is set to 0, the library uses the square root of the number of features for classification and (the number of features)/3 for regression. |
|
maxTreeDepth |
0 |
Maximal tree depth. Default is 0 (unlimited). |
|
minObservationsInLeafNodes |
1 for classification, 5 for regression |
Minimum number of observations in the leaf node. |
|
impurityThreshold |
0 |
The threshold value used as stopping criteria: if the impurity value in the node is smaller than the threshold, the node is not split anymore. |
|
varImportance |
none |
The variable importance computation mode. Possible values:
|
|
resultsToCompute |
0 |
The 64-bit integer flag that specifies which extra characteristics of the decision forest to compute. Provide one of the following values to request a single characteristic or use bitwise OR to request a combination of the characteristics:
|
|
engine |
SharePtr< engines:: mt2203:: Batch>() |
Pointer to the random number generator engine. |
In addition to regression or classifier output, decision forest calculates the result described below. Pass the Result ID as a parameter to the methods that access the result of your algorithm. For more details, see Algorithms.
Result ID |
Result |
|
---|---|---|
outOfBagError |
Numeric table 1x1 containing out-of-bag error computed when the computeOutOfBagError option is on. By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable. |
|
variableImportance |
Numeric table 1 x p that contains variable importance values for each feature. If you set the varImportance parameter to none, the library returns a null pointer to the table. By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except PackedTriangularMatrix and PackedSymmetricMatrix. |
|
outOfBagErrorPerObservation |
Numeric table of size 1 x n that contains the computed out-of-bag error when the computeOutOfBagErrorPerObservation option is enabled. The value -1 in the table indicates that no OOB value was computed because this observation was not in OOB set for any of the trees in the model (never left out during the bootstrap). By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable. |
|
updatedEngine |
Engine instance with state updated after computations. |