Developer Guide for Intel® Data Analytics Acceleration Library 2019 Update 2

Details

Given n feature vectors X = { x 1= (x 11,…,x 1p ), ..., x n = (x n1,…,x np ) } of n p-dimensional feature vectors and vector of dependent variables y = (y 1, … ,y n ), the problem is to build a decision forest regression model that minimizes the Mean-Square Error (MSE) between the predicted and true value.

Training Stage

Decision forest classifier follows the algorithmic framework of decision forest training with variance as impurity metrics, calculated as follows:



Prediction Stage

Given decision forest regression model and vectors x 1, ... , x r , the problem is to calculate the responses for those vectors. To solve the problem for each given query vector x i , the algorithm finds the leaf node in a tree in the forest that gives the response by that tree as the mean of dependent variables. The forest predicts the response as the mean of responses from trees.

Out-of-bag Error

Decision forest regression follows the algorithmic framework for calculating the decision forest out-of-bag (OOB) error, where aggregation of the out-of-bag predictions in all trees and calculation of the OOB error of the decision forest is done as follows: