Developer Guide for Intel® Data Analytics Acceleration Library 2018

Quality Metrics for Linear Regression

Given a data set X = (x i ) that contains vectors of input variables x i = (x i1, …, x ip ), respective responses z i = (z i1, …, z ik ) computed at the prediction stage of the linear regression model defined by its coefficients β ht , h = 1, ..., k, t = 1, ..., p, and expected responses y i = (y i1, …, y ik ), i = 1, ..., n, the problem is to evaluate the linear regression model by computing the root mean square error, variance-covariance matrix of beta coefficients, various statistics functions, and so on. See Linear Regression for additional details and notations.

For linear regressions, the library computes statistics listed in tables below for testing insignificance of beta coefficients:

The statistics are computed given the following assumptions about the data distribution:

For more details, see [Hastie2009].

Testing Insignificance of a Single Beta

The library uses the following quality metrics:

Quality Metric

Definition

Root Mean Square (RMS) Error

Vector of variances

A set of variance-covariance matrices C = C 1, ..., C k for vectors of betas β jt , j = 1, ..., k

Z-score statistics used in testing of insignificance of a single coefficient β jt



σ i is the j-th element of the vector of variance σ 2 and ν t is the t-th diagonal element of the matrix (X T X)-1

Confidence interval for β jt



pc 1 - α is the (1 - α) percentile of the Gaussian distribution, σ i is the j-th element of the vector of variance σ 2, and ν t is the t-th diagonal element of the matrix (X T X)-1

Testing Insignificance of a Group of Betas

The library uses the following quality metrics:

Quality Metric

Definition

Mean of expected responses, ERM = (ERM 1, ..., ERM k )

Variance of expected responses, ERV = (ERV 1, ..., ERV k )

Regression Sum of Squares RegSS = (RegSS 1, ..., RegSS k )

Sum of Squares of Residuals ResSS = (ResSS 1, ..., ResSS k )

Total Sum of Squares TSS = (TSS 1, ..., TSS k )

Determination Coefficient



F-statistics used in testing insignificance of a group of betas F = (F 1, ..., F k )



where ResSS j are computed for a model with p +1 betas and ResSS 0j are computed for a reduced model with p 0+1 betas (p - p 0 betas are set to zero)