ols

Syntax

ols(Y, X, [intercept=true], [mode=0], [method='default'],[usePinv=true])

Arguments

Y is the dependent variable; X is the independent variable(s).

Y is a vector; X is a vector/matrix/table/tuple.

When X is a matrix,

  • If the number of rows equals the length of Y, each column of X is a factor;
  • If the number of rows is not the same as the length of Y, and the number of columns equals the length of Y, each row of X is a factor.

intercept is a Boolean variable indicating whether the regression includes the intercept. If it is true, the system automatically adds a column of 1's to X to generate the intercept. The default value is true.

mode is an integer indicating the contents in the output. It can be:

  • 0 (default): a vector of the coefficient estimates.

  • 1: a table with coefficient estimates, standard error, t-statistics, and p-values.

  • 2: a dictionary with the following keys: ANOVA, RegressionStat, Coefficient and Residual

Table 1. ANOVA (one-way analysis of variance)
Source of Variance DF (degree of freedom) SS (sum of square) MS (mean of square) F (F-score) Significance
Regression p sum of squares regression, SSR regression mean square, MSR=SSR/R MSR/MSE p-value
Residual n-p-1 sum of squares error, SSE mean square error, MSE=MSE/E
Total n-1 sum of squares total, SST
Table 2. RegressionStat (Regression statistics)
Item Description
R2 R-squared
AdjustedR2 The adjusted R-squared corrected based on the degrees of freedom by comparing the sample size to the number of terms in the regression model.
StdError The residual standard error/deviation corrected based on the degrees of freedom.
Observations The sample size.
Table 3. Coefficient
Item Description
factor Independent variables
beta Estimated regression coefficients
StdError Standard error of the regression coefficients
tstat t statistic, indicating the significance of the regression coefficients

Residual: the difference between each predicted value and the actual value.

method (optional) is a string indicating the method for the ordinary-least-squares regression problem.

  • When set to "default" (by default), ols solves the problem by constructing coefficient matrices and inverse matrices.

  • When set to "svd", ols solves the problem by using singular value decomposition.

usePinv (optional) is a Boolean value indicating whether to use pseudo-inverse method to calculate inverse of a matrix.

  • true (default): computing the pseudo-inverse of the matrix. It must be true for singular matrices.

  • false: computing the inverse of the matrix, which is only applicable to non-singular matrices.

Details

Return the result of an ordinary-least-squares regression of Y on X.

Note that NULL values in X and Y are treated as 0 in calculations.

Examples

x1=1 3 5 7 11 16 23
x2=2 8 11 34 56 54 100
y=0.1 4.2 5.6 8.8 22.1 35.6 77.2;

ols(y, x1);
// output
[-9.912821,3.378632]

ols(y, (x1,x2));
// output
[-9.494813,2.806426,0.13147]
ols(y, (x1,x2), 1, 1);
factor beta stdError tstat pvalue
intercept -9.494813 5.233168 -1.814353 0.143818
x1 2.806426 1.830782 1.532911 0.20007
x2 0.13147 0.409081 0.321379 0.764015
ols(y, (x1,x2), 1, 2);
// output
ANOVA->
Breakdown  DF SS          MS          F         Significance
---------- -- ----------- ----------- --------- ------------
Regression 2  4204.416396 2102.208198 31.467739 0.003571
Residual   4  267.220747  66.805187
Total      6  4471.637143

RegressionStat->
item         statistics
------------ ----------
R2           0.940241
AdjustedR2   0.910361
StdError     8.173444
Observations 7

Coefficient->
factor    beta      stdError tstat     pvalue
--------- --------- -------- --------- --------
intercept -9.494813 5.233168 -1.814353 0.143818
x1        2.806426  1.830782 1.532911  0.20007
x2        0.13147   0.409081 0.321379  0.764015
x=matrix(1 4 8 2 3, 1 4 2 3 8, 1 5 1 1 5);
x;
#0 #1 #2
1 1 1
4 4 5
8 2 1
2 3 1
3 8 5
ols(1..5, x);
// output
[1.156537,0.105505,0.91055,-0.697821]

ols(1..5, x.transpose());
// output
[1.156537,0.105505,0.91055,-0.697821]
// the system adjusts the dimensions of the dependent variable and the independent variables for the regression to go through.