Developer Guide for Intel® Data Analytics Acceleration Library 2019 Update 5
Given:
The problem is to build a decision tree classifier.
The library provides the decision tree classification algorithm based on split criteria Gini index [Breiman84] and Information gain [Quinlan86], [Mitchell97]:
Gini index
where
D is a set of observations that reach the node
To find the best test using Gini index, each possible test is examined using
where
O( τ ) is the set of all possible outcomes of test τ
D
v
is the subset of
D, for which outcome of
τ
is
v, for example,
.
The test to be used in the node is selected as
. For binary decision tree with 'true' and 'false' branches,
where
The classification decision tree follows the algorithmic framework of decision tree training described in Classification and Regression > Decision tree >Training stage.
The classification decision tree follows the algorithmic framework of decision tree prediction described in Classification and Regression > Decision tree > Prediction stage.
Given decision tree and vectors x 1, …, x r , the problem is to calculate the responses for those vectors.