Developer Guide for Intel® Data Analytics Acceleration Library 2019 Update 5
Given n feature vectors X = { x 1= (x 11,…,x 1p ), ..., x n = (x n1,…,x np ) } of n p-dimensional feature vectors, a vector of class labels y = (y 1, … ,y n ), where y i ∈ {0, 1, ... , C - 1} describes the class to which the feature vector x i belongs and C is the number of classes, the problem is to build a decision forest classifier.
Decision forest classifier follows the algorithmic framework of decision forest training with Gini impurity metrics as impurity metrics, that are calculated as follows:
where
is the fraction of observations in the subset
D that belong to the
i-th class.
Given decision forest classifier and vectors x 1, ... , x r , the problem is to calculate the labels for those vectors. To solve the problem for each given query vector x i , the algorithm finds the leaf node in a tree in the forest that gives the classification response by that tree. The forest chooses the label y taking the majority of trees in the forest voting for that label.
Decision forest classifier follows the algorithmic framework for calculating the decision forest out-of-bag (OOB) error, where aggregation of the out-of-bag predictions in all trees and calculation of the OOB error of the decision forest is done as follows:
The library computes Mean Decrease Impurity (MDI) importance measure, also known as the Gini importance or Mean Decrease Gini, by using the Gini index as impurity metrics.
If you already have a set of precomputed values for nodes in each tree, you can use the Model Builder class to get a trained Intel DAAL Decision Forest Classification model based on the external model you have.
The following schema illustrates the use of the Model Builder class for Decision Forest Classification:
For general information on using the Model Builder class, see Training and Prediction. For details on using the Model Builder class for Decision Forest Classification, see Usage of training alternative.