C++ API Reference for Intel® Data Analytics Acceleration Library 2018 Update 2

Namespaces | Enumerations
daal::algorithms::kmeans::init Namespace Reference

Contains classes for computing initial centroids for the K-Means algorithm.

Namespaces

 interface1
 Contains version 1.0 of the Intel(R) Data Analytics Acceleration Library (Intel(R) DAAL) interface.
 

Enumerations

enum  Method {
  deterministicDense = 0, defaultDense = 0, randomDense = 1, plusPlusDense = 2,
  parallelPlusDense = 3, deterministicCSR = 4, randomCSR = 5, plusPlusCSR = 6,
  parallelPlusCSR = 7
}
 
enum  InputId { data }
 Available identifiers of input objects for computing initial centroids for the K-Means algorithm. More...
 
enum  DistributedStep2MasterInputId { partialResults }
 Available identifiers of input objects for computing initial centroids for the K-Means algorithm in the distributed processing mode. More...
 
enum  DistributedLocalPlusPlusInputDataId { internalInput = lastDistributedStep2MasterInputId + 1 }
 Available identifiers of input objects for computing initial centroids for the K-Means algorithm used with plusPlus and parallelPlus methods only on a local node. More...
 
enum  DistributedStep2LocalPlusPlusInputId { inputOfStep2 = lastDistributedLocalPlusPlusInputDataId + 1 }
 Available identifiers of input objects for computing initial centroids for the K-Means algorithm used with plusPlus and parallelPlus methods only on the 2nd step on a local node. More...
 
enum  DistributedStep3MasterPlusPlusInputId { inputOfStep3FromStep2 }
 Available identifiers of input objects for computing initial centroids for the K-Means algorithm used with plusPlus and parallelPlus methods only on the 3rd step on a master node. More...
 
enum  DistributedStep4LocalPlusPlusInputId { inputOfStep4FromStep3 = lastDistributedLocalPlusPlusInputDataId + 1 }
 Available identifiers of input objects for computing initial centroids for the K-Means algorithm used with plusPlus and parallelPlus methods only on a local node. More...
 
enum  DistributedStep5MasterPlusPlusInputId { inputCentroids, inputOfStep5FromStep2 }
 Available identifiers of input objects for computing initial centroids for the K-Means algorithm used with parallelPlus method only on a master node. More...
 
enum  DistributedStep5MasterPlusPlusInputDataId { inputOfStep5FromStep3 = lastDistributedStep5MasterPlusPlusInputId + 1 }
 Available identifiers of input objects for computing initial centroids for the K-Means algorithm used with parallelPlus methods only on the 5th step on a master node. More...
 
enum  PartialResultId { partialCentroids, partialClusters = partialCentroids, partialClustersNumber }
 Available identifiers of partial results of computing initial centroids for the K-Means algorithm in the distributed processing mode. More...
 
enum  DistributedStep2LocalPlusPlusPartialResultId { outputOfStep2ForStep3, outputOfStep2ForStep5 }
 Available identifiers of partial results of computing initial centroids for the K-Means algorithm in the distributed processing mode used with plusPlus and parallelPlus methods only on the 2nd step on a local node. More...
 
enum  DistributedStep2LocalPlusPlusPartialResultDataId { internalResult = lastDistributedStep2LocalPlusPlusPartialResultId + 1 }
 Available identifiers of partial results of computing initial centroids for the K-Means algorithm in the distributed processing mode used with plusPlus and parallelPlus methods only on the 2nd step on a local node. More...
 
enum  DistributedStep3MasterPlusPlusPartialResultId { outputOfStep3ForStep4 }
 Available identifiers of partial results of computing initial centroids for the K-Means algorithm in the distributed processing mode used with plusPlus and parallelPlus methods only on the 3rd step on a master node. More...
 
enum  DistributedStep3MasterPlusPlusPartialResultDataId { rngState = lastDistributedStep3MasterPlusPlusPartialResultId + 1, outputOfStep3ForStep5 = rngState }
 Available identifiers of partial results of computing initial centroids for the K-Means algorithm in the distributed processing mode used with parallelPlus method only on the 3rd step on a master node. More...
 
enum  DistributedStep4LocalPlusPlusPartialResultId { outputOfStep4 }
 Available identifiers of partial results of computing initial centroids for the K-Means algorithm in the distributed processing mode used with plusPlus and parallelPlus methods only on the 4th step on a local node. More...
 
enum  DistributedStep5MasterPlusPlusPartialResultId { candidates, weights }
 Available identifiers of partial results of computing initial centroids for the K-Means algorithm in the distributed processing mode used with parallelPlus method only on the 5th step on a master node. More...
 
enum  ResultId { centroids }
 Available identifiers of the results of computing initial centroids for the K-Means algorithm. More...
 

Enumeration Type Documentation

◆ DistributedLocalPlusPlusInputDataId

Enumerator
internalInput 

DataCollection with internal algorithm data calculated by previous steps on this node

◆ DistributedStep2LocalPlusPlusInputId

Enumerator
inputOfStep2 

Numeric table with the new centroids calculated by previous steps of initialization algorithm

◆ DistributedStep2LocalPlusPlusPartialResultDataId

Enumerator
internalResult 

DataCollection with internal algorithm data required as an input for the future steps on the node

◆ DistributedStep2LocalPlusPlusPartialResultId

Enumerator
outputOfStep2ForStep3 

Numeric table containing output from step 2 on the local node used by step 3 on a master node

outputOfStep2ForStep5 

Numeric table containing output from step 2 on the local node used by step 5 on a master node

◆ DistributedStep2MasterInputId

Enumerator
partialResults 

Collection of partial results computed on local nodes

◆ DistributedStep3MasterPlusPlusInputId

Enumerator
inputOfStep3FromStep2 

Numeric table with the data calculated on step2 on local nodes

◆ DistributedStep3MasterPlusPlusPartialResultDataId

Enumerator
rngState 

Service data generated as the output of step3Master to be used in step5Master

outputOfStep3ForStep5 

Service data generated as the output of step3Master to be used in step5Master

◆ DistributedStep3MasterPlusPlusPartialResultId

Enumerator
outputOfStep3ForStep4 

KeyValueDataCollection with the input for local nodes on step 4

◆ DistributedStep4LocalPlusPlusInputId

Enumerator
inputOfStep4FromStep3 

Numeric table with the data calculated on step3 on master node

◆ DistributedStep4LocalPlusPlusPartialResultId

Enumerator
outputOfStep4 

NumericTable with the new centroids calculated on step 4 on the local node

◆ DistributedStep5MasterPlusPlusInputDataId

Enumerator
inputOfStep5FromStep3 

Service data generated as the output of step3Master

◆ DistributedStep5MasterPlusPlusInputId

Enumerator
inputCentroids 

DataCollection of NumericTables with the new centroids

inputOfStep5FromStep2 

DataCollection of NumericTables with the new centroids rating

◆ DistributedStep5MasterPlusPlusPartialResultId

Enumerator
candidates 

NumericTable with the new centroids calculated on the previous steps

weights 

NumericTable with the weights of the new centroids calculated on the previous steps

◆ InputId

enum InputId

Enumerator
data 

Input data table

◆ Method

enum Method

Available methods for computing initial centroids for the K-Means algorithm

Enumerator
deterministicDense 

Default: uses first nClusters points as initial centroids

defaultDense 

Synonym of deterministicDense

randomDense 

Uses random nClusters points as initial centroids

plusPlusDense 

Kmeans++ algorithm by Arthur and Vassilvitskii (2007): http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf [1] the first center is selected at random, each subsequent center is selected with a probability proportional to its contribution to the overall error

parallelPlusDense 

Kmeans|| algorithm: scalable Kmeans++ by Bahmani et al. (2012) http://vldb.org/pvldb/vol5/p622_bahmanbahmani_vldb2012.pdf [2]

deterministicCSR 

Uses first nClusters points as initial centroids for data in a CSR numeric table

randomCSR 

Uses random nClusters points as initial centroids for data in a CSR numeric table

plusPlusCSR 

Kmeans++ algorithm Arthur and Vassilvitskii (2007) http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf [1] for the data in a CSR numeric table: the first center is selected at random, each subsequent center is selected with a probability proportional to its contribution to the overall error

parallelPlusCSR 

Kmeans|| algorithm: scalable Kmeans++ by Bahmani et al. (2012) http://vldb.org/pvldb/vol5/p622_bahmanbahmani_vldb2012.pdf [2] for the data in a CSR numeric table

◆ PartialResultId

Enumerator
partialCentroids 

Table with the sum of observations assigned to centroids

partialClusters 

Table with the sum of observations assigned to centroids

Deprecated:
This item will be removed in a future release.
partialClustersNumber 

Table with the number of observations assigned to centroids

Deprecated:
This item will be removed in a future release.

◆ ResultId

enum ResultId

Enumerator
centroids 

Table for cluster centroids

For more complete information about compiler optimizations, see our Optimization Notice.