Developer Guide for Intel® Data Analytics Acceleration Library 2018

Details

Given:

The problem is to build a decision tree classifier.

Split Criteria

The library provides the decision tree classification algorithm based on split criteria Gini index [Breiman84] and Information gain [Quinlan86], [Mitchell97]:

  1. Gini index



    where

    • D is a set of observations that reach the node
    • p i is the observed fraction of observations with class i in D

      To find the best test using Gini index, each possible test is examined using



      where

      • O( τ ) is the set of all possible outcomes of test τ
      • D v is the subset of D, for which outcome of τ is v, for example, .

        The test to be used in the node is selected as . For binary decision tree with 'true' and 'false' branches,

  2. Information gain



    where

    • O( τ ), D, D v are defined above
    • , with p i defined above in Gini index.

      Similarly to Gini index, the test to be used in the node is selected as . For binary decision tree with 'true' and 'false' branches, .

Training Stage

The classification decision tree follows the algorithmic framework of decision tree training described in Classification and Regression > Decision tree >Training stage.

Prediction Stage

The classification decision tree follows the algorithmic framework of decision tree prediction described in Classification and Regression > Decision tree > Prediction stage.

Given decision tree and vectors x 1, …, x r , the problem is to calculate the responses for those vectors.