lassoBasic
Syntax
lassoBasic(Y, X, [mode=0], [alpha=1.0], [intercept=true], [normalize=false],
[maxIter=1000], [tolerance=0.0001], [positive=false], [swColName],
[checkInput=true])
Details
Perform lasso regression.
Minimize the following objective function:
Arguments
Y is a numeric vector indicating the dependent variables.
X is a numeric vector/tuple/matrix/table, indicating the independent variables.
-
When X is a vector/tuple, its length must be equal to the length of Y.
-
When X is a matrix/table, its number of rows must be equal to the length of Y.
intercept is a Boolean variable indicating whether the regression includes the intercept. If it is true, the system automatically adds a column of 1's to X to generate the intercept. The default value is true.
-
0 (default) : a vector of the coefficient estimates.
-
1: a table with coefficient estimates, standard error, t-statistics, and p-values.
-
2: a dictionary with the following keys: ANOVA, RegressionStat, Coefficient and Residual
ANOVA (one-way analysis of variance)
Source of Variance | DF (degree of freedom) | SS (sum of square) | MS (mean of square) | F (F-score) | Significance |
---|---|---|---|---|---|
Regression | p | sum of squares regression, SSR | regression mean square, MSR=SSR/R | MSR/MSE | p-value |
Residual | n-p-1 | sum of squares error, SSE | mean square error, MSE=MSE/E | ||
Total | n-1 | sum of squares total, SST |
RegressionStat (Regression statistics)
Item | Description |
---|---|
R2 | R-squared |
AdjustedR2 | The adjusted R-squared corrected based on the degrees of freedom by comparing the sample size to the number of terms in the regression model. |
StdError | The residual standard error/deviation corrected based on the degrees of freedom. |
Observations | The sample size. |
Coefficient
Item | Description |
---|---|
factor | Independent variables |
beta | Estimated regression coefficients |
StdError | Standard error of the regression coefficients |
tstat | t statistic, indicating the significance of the regression coefficients |
Residual: the difference between each predicted value and the actual value.
alpha is a floating number representing the constant that multiplies the L1-norm. The default value is 1.0.
intercept is a Boolean value indicating whether to include the intercept in the regression. The default value is true.
normalize is a Boolean value. If true, the regressors will be normalized before regression by subtracting the mean and dividing by the L2-norm. If intercept =false, this parameter will be ignored. The default value is false.
maxIter is a positive integer indicating the maximum number of iterations. The default value is 1000.
tolerance is a floating number. The iterations stop when the improvement in the objective function value is smaller than tolerance. The default value is 0.0001.
positive is a Boolean value indicating whether to force the coefficient estimates to be positive. The default value is false.
swColName is a STRING indicating a column name of ds. The specified column is used as the sample weight. If it is not specified, the sample weight is treated as 1.
-
If checkInput = true (default), it will check the invalid value for parameters and throw an error if the NULL value exists.
-
If checkInput = false, the invalid value is not checked.
It is recommended to specify checkInput = true. If it is false, it must be ensured that there are no invalid values in the input parameters and no invalid values are generated during intermediate calculations, otherwise the returned model may be inaccurate.
Examples
x1=1 3 5 7 11 16 23
x2=2 8 11 34 56 54 100
y=0.1 4.2 5.6 8.8 22.1 35.6 77.2;
print(lassoBasic(y, (x1,x2), mode = 0));
// output
[-9.133706333069543,2.535935196073186,0.189298948643987]
print(lassoBasic(y, (x1,x2), mode = 1));
// output
factor beta stdError tstat pvalue
--------- ------------------ ----------------- ------------------ -----------------
intercept -9.133706333069543 5.247492365971091 -1.740584968222107 0.156730846105191
x1 2.535935196073186 1.835793667840723 1.38138356205138 0.239309472176311
x2 0.189298948643987 0.410201227095842 0.461478260277749 0.66843504931137
print(lassoBasic(y, (x1,x2), mode = 2));
// output
Coefficient->
factor beta stdError tstat pvalue
--------- ------------------ ----------------- ------------------ -----------------
intercept -9.133706333069543 5.247492365971091 -1.740584968222107 0.156730846105191
x1 2.535935196073186 1.835793667840723 1.38138356205138 0.239309472176311
x2 0.189298948643987 0.410201227095842 0.461478260277749 0.66843504931137
RegressionStat->
item statistics
------------ -----------------
R2 0.931480447323074
AdjustedR2 0.897220670984611
StdError 8.195817208870076
Observations 7
ANOVA->
Breakdown DF SS MS F Significance
---------- -- -------------------- -------------------- ------------------ -----------------
Regression 2 4165.242566095043912 2082.621283047521956 31.004574440904473 0.003672076469395
Residual 4 268.685678884843582 67.171419721210895
Total 6 4471.637142857141952
Residual->
[6.319173239708383,4.21150915569809,-0.028258082380245,-6.254004293338318,-7.262321947798779,-6.063400030876729,9.077301958987561]