lassoBasic
Syntax
lassoBasic(Y, X, [mode=0], [alpha=1.0], [intercept=true], [normalize=false], [maxIter=1000], [tolerance=0.0001], [positive=false], [swColName], [checkInput=true])
Details
Perform lasso regression.
Minimize the following objective function:
Arguments
Y is a numeric vector indicating the dependent variables.
X is a numeric vector/tuple/matrix/table, indicating the independent variables.
When X is a vector/tuple, its length must be equal to the length of Y.
When X is a matrix/table, its number of rows must be equal to the length of Y.
intercept is a Boolean variable indicating whether the regression includes the intercept. If it is true, the system automatically adds a column of 1’s to X to generate the intercept. The default value is true.
mode is an integer that can take the following three values:
0 (default) : a vector of the coefficient estimates.
1: a table with coefficient estimates, standard error, t-statistics, and p-values.
2: a dictionary with the following keys: ANOVA, RegressionStat, Coefficient and Residual
ANOVA (one-way analysis of variance)
Source of Variance |
DF (degree of freedom) |
SS (sum of square) |
MS (mean of square) |
F (F-score) |
Significance |
---|---|---|---|---|---|
Regression |
p |
sum of squares regression, SSR |
regression mean square, MSR=SSR/R |
MSR/MSE |
p-value |
Residual |
n-p-1 |
sum of squares error, SSE |
mean square error, MSE=MSE/E |
||
Total |
n-1 |
sum of squares total, SST |
RegressionStat (Regression statistics)
Item |
Description |
---|---|
R2 |
R-squared |
AdjustedR2 |
The adjusted R-squared corrected based on the degrees of freedom by comparing the sample size to the number of terms in the regression model. |
StdError |
The residual standard error/deviation corrected based on the degrees of freedom. |
Observations |
The sample size. |
Coefficient
Item |
Description |
---|---|
factor |
Independent variables |
beta |
Estimated regression coefficients |
StdError |
Standard error of the regression coefficients |
tstat |
t statistic, indicating the significance of the regression coefficients |
Residual: the difference between each predicted value and the actual value.
alpha is a floating number representing the constant that multiplies the L1-norm. The default value is 1.0.
intercept is a Boolean value indicating whether to include the intercept in the regression. The default value is true.
normalize is a Boolean value. If true, the regressors will be normalized before regression by subtracting the mean and dividing by the L2-norm. If intercept =false, this parameter will be ignored. The default value is false.
maxIter is a positive integer indicating the maximum number of iterations. The default value is 1000.
tolerance is a floating number. The iterations stop when the improvement in the objective function value is smaller than tolerance. The default value is 0.0001.
positive is a Boolean value indicating whether to force the coefficient estimates to be positive. The default value is false.
swColName is a STRING indicating a column name of ds. The specified column is used as the sample weight. If it is not specified, the sample weight is treated as 1.
checkInput is a BOOLEAN value. It determines whether to enable validation check for parameters yColName, xColNames, and swColName.
If checkInput = true (default), it will check the invalid value for parameters and throw an error if the NULL value exists.
If checkInput = false, the invalid value is not checked.
It is recommended to specify checkInput = true. If it is false, it must be ensured that there are no invalid values in the input parameters and no invalid values are generated during intermediate calculations, otherwise the returned model may be inaccurate.
Examples
$ x1=1 3 5 7 11 16 23
$ x2=2 8 11 34 56 54 100
$ y=0.1 4.2 5.6 8.8 22.1 35.6 77.2;
$ print(lassoBasic(y, (x1,x2), mode = 0));
[-9.133706333069543,2.535935196073186,0.189298948643987]
$ print(lassoBasic(y, (x1,x2), mode = 1));
factor beta stdError tstat pvalue
--------- ------------------ ----------------- ------------------ -----------------
intercept -9.133706333069543 5.247492365971091 -1.740584968222107 0.156730846105191
x1 2.535935196073186 1.835793667840723 1.38138356205138 0.239309472176311
x2 0.189298948643987 0.410201227095842 0.461478260277749 0.66843504931137
$ print(lassoBasic(y, (x1,x2), mode = 2));
Coefficient->
factor beta stdError tstat pvalue
--------- ------------------ ----------------- ------------------ -----------------
intercept -9.133706333069543 5.247492365971091 -1.740584968222107 0.156730846105191
x1 2.535935196073186 1.835793667840723 1.38138356205138 0.239309472176311
x2 0.189298948643987 0.410201227095842 0.461478260277749 0.66843504931137
RegressionStat->
item statistics
------------ -----------------
R2 0.931480447323074
AdjustedR2 0.897220670984611
StdError 8.195817208870076
Observations 7
ANOVA->
Breakdown DF SS MS F Significance
---------- -- -------------------- -------------------- ------------------ -----------------
Regression 2 4165.242566095043912 2082.621283047521956 31.004574440904473 0.003672076469395
Residual 4 268.685678884843582 67.171419721210895
Total 6 4471.637142857141952
Residual->
[6.319173239708383,4.21150915569809,-0.028258082380245,-6.254004293338318,-7.262321947798779,-6.063400030876729,9.077301958987561]