Developer Guide for Intel® Data Analytics Acceleration Library 2019 Update 5

Density-Based Spatial Clustering of Applications with Noise

Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed in [Ester96]. It is a density-based clustering non-parametric algorithm: given a set of observations in some space, it groups together observations that are closely packed together (observations with many nearby neighbors), marking as outliers observations that lie alone in low-density regions (whose nearest neighbors are too far away).

Details

Given the set X = {x1 = (x11, ..., x1p), ..., xn = (xn1, ..., xnp)} of n p-dimensional feature vectors (further referred as observations), a positive floating-point number epsilon and a positive integer minObservations, the problem is to get clustering assignments for each input observation, based on the definitions below [Ester96]:

Each cluster will get a unique identifier, an integer number from 0 to (total number of clusters – 1). Assignment of each observation is an identifier of the cluster to which it belongs, or -1 if the observation considered to be a noise observation.