Developer Guide for Intel® Data Analytics Acceleration Library 2018
Given n feature vectors X = { x 1= (x 11,…,x 1p ), ..., x n = (x n1,…,x np ) } of n p-dimensional feature vectors, a vector of class labels y = (y 1, … ,y n ), where y i ∈ {0, 1, ... , C - 1} describes the class to which the feature vector x i belongs and C is the number of classes, the problem is to build a decision forest classifier.
Decision forest classifier follows the algorithmic framework of decision forest training with Gini impurity metrics as impurity metrics, that are calculated as follows:
where
is the fraction of observations in the subset
D that belong to the
i-th class.
Given decision forest classifier and vectors x 1, ... , x r , the problem is to calculate the labels for those vectors. To solve the problem for each given query vector x i , the algorithm finds the leaf node in a tree in the forest that gives the classification response by that tree. The forest chooses the label y taking the majority of trees in the forest voting for that label.
Decision forest classifier follows the algorithmic framework for calculating the decision forest out-of-bag (OOB) error, where aggregation of the out-of-bag predictions in all trees and calculation of the OOB error of the decision forest is done as follows:
The library computes Mean Decrease Impurity (MDI) importance measure, also known as the Gini importance or Mean Decrease Gini, by using the Gini index as impurity metrics.