Developer Guide for Intel® Data Analytics Acceleration Library 2019 Update 1
The distributed processing mode assumes that the data set R is split in nblocks blocks across computation nodes.
In the distributed processing mode, initialization of item factors for the implicit ALS algorithm has the following parameters:
Parameter |
Default Value |
Description |
|
---|---|---|---|
algorithmFPType |
float |
The floating-point type that the algorithm uses for intermediate computations. Can be float or double. |
|
method |
fastCSR |
Performance-oriented computation method for CSR numeric tables, the only method supported by the algorithm. |
|
nFactors |
10 |
The total number of factors. |
|
fullNUsers |
0 |
The total number of users m. |
|
partition |
Numeric table of size either 1 x 1 that provides the number of input data parts; or (nblocks + 1) x 1, where nblocks is the number of input data parts, and the i-th element contains the offset of the transposed i-th data part to be computed by the initialization algorithm. |
||
engine |
SharePtr< engines:: mt19937:: Batch>() |
Pointer to the random number generator engine that is used internally at the initialization step. |
To initialize the implicit ALS algorithm in the distributed processing mode, use the one-step process illustrated by the following diagram for nblocks=3:
In the distributed processing mode, initialization of item factors for the implicit ALS algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.
Input ID |
Input |
|
---|---|---|
dataColumnSlice |
An n i x m numeric table with the part of the input data set. Each node holds n i rows of the full transposed input data set R T . The input should be an object of CSRNumericTable class. |
In the distributed processing mode, initialization of item factors for the implicit ALS algorithm calculates the results described below. Pass the Partial Result ID as a parameter to the methods that access the results of your algorithm. Partial results that correspond to the outputOfInitForComputeStep3 and offsets Partial Result IDs should be transferred to Step 3 of the distributed ALS training algorithm.
Output of Initialization for Computing Step 3 (outputOfInitForComputeStep3) is a key-value data collection that maps components of the partial model on the i-th node to all local nodes. Keys in this data collection are indices of the nodes and the value that corresponds to each key i is a numeric table that contains indices of the factors of the items to be transferred to the i-th node on Step 3 of the distributed ALS training algorithm.
User Offsets (offsets) is a key-value data collection, where the keys are indices of the nodes and the value that correspond to the key i is a numeric table of size 1 x 1 that contains the value of the starting offset of the user factors stored on the i-th node.
For more details, see Algorithms.
Partial Result ID |
Result |
|
---|---|---|
partialModel |
The model with initialized item factors. The result can only be an object of the PartialModel class. |
|
outputOfInitForComputeStep3 |
A key-value data collection that maps components of the partial model to the local nodes. |
|
offsets |
A key-value data collection of size nblocks that holds the starting offsets of the factor indices on each node. |
|
outputOfStep1ForStep2 |
A key-value data collection of size
nblocks that contains the parts of the input numeric table:
j -th element of this collection is a numeric table of size
m
j
x
n
i
, where |
This step uses the results of the previous step.
Input ID |
Input |
|
---|---|---|
inputOfStep2FromStep1 |
A key-value data collection of size nblocks that contains the parts of the input data set: i -th element of this collection is a numeric table of size m i x n i . Each numeric table in the collection should be an object of CSRNumericTable class. |
In this step, implicit ALS initialization calculates the partial results described below. Pass the Partial Result ID as a parameter to the methods that access the results of your algorithm. Partial results that correspond to the outputOfInitForComputeStep3 and offsets Partial Result IDs should be transferred to Step 3 of the distributed ALS training algorithm.
Output of Initialization for Computing Step 3 (outputOfInitForComputeStep3) is a key-value data collection that maps components of the partial model on the i-th node to all local nodes. Keys in this data collection are indices of the nodes and the value that corresponds to each key i is a numeric table that contains indices of the user factors to be transferred to the i-th node on Step 3 of the distributed ALS training algorithm.
Item Offsets (offsets) is a key-value data collection, where the keys are indices of the nodes and the value that correspond to the key i is a numeric table of size 1 x 1 that contains the value of the starting offset of the item factors stored on the i-th node.
For more details, see Algorithms.
Partial Result ID |
Result |
|
---|---|---|
dataRowSlice |
An m j x n numeric table with the mining data. j-th node gets m j rows of the full input data set R. |
|
outputOfInitForComputeStep3 |
A key-value data collection that maps components of the partial model to the local nodes. |
|
offsets |
A key-value data collection of size nblocks that holds the starting offsets of the factor indices on each node. |