Given a set
X of
n feature vectors
x
1= (x
11,…,x
1p
), ...,
x
n
= (x
n1,…,x
np
) of dimension
p, the problem is to identify the vectors that do not belong to the underlying distribution using the BACON method (see [Billor2000]).
In the iterative method, each iteration involves several steps:
- Identify an initial basic subset of
m >
p feature vectors that can be assumed as not containing outliers. The constant
m is set to 5p. The library supports two approaches to selecting the initial subset:
- Based on distances from the medians ||x
i
-
med||, where:
-
med is the vector of coordinate-wise medians
- ||.|| is the vector norm
-
i=1, ...,
n
- Based on the Mahalanobis distance

, where:
-
mean and
S are the mean and the covariance matrix, respectively, of
n feature vectors
-
i=1, ...,
n
Each method chooses
m feature vectors with the smallest values of distances.
- Compute the discrepancies using the Mahalanobis distance above, where
mean and
S are the mean and the covariance matrix, respectively, computed for the feature vectors contained in the basic subset.
- Set the new basic subset to all feature vectors with the discrepancy less than

, where:

is the (1 - α) percentile of the Chi2 distribution with
p degrees of freedom

where
-
r is the size of the current basic subset

, where

and [ ] is the integer part of a number

- Iterate steps 2 and 3 until the size of the basic subset no longer changes.
- Nominate the feature vectors that are not part of the final basic subset as outliers.