lassoBasic
Syntax
lassoBasic(Y, X, [mode=0], [alpha=1.0], [intercept=true], [normalize=false],
[maxIter=1000], [tolerance=0.0001], [positive=false], [swColName],
[checkInput=true])
Details
Perform lasso regression.
Minimize the following objective function:
Arguments
Y is a numeric vector indicating the dependent variables.
-
When X is a vector/tuple, its length must be equal to the length of Y.
-
When X is a matrix/table, its number of rows must be equal to the length of Y.
-
0 (default) : a vector of the coefficient estimates.
-
1: a table with coefficient estimates, standard error, t-statistics, and p-values.
-
2: a dictionary with the following keys: ANOVA, RegressionStat, Coefficient and Residual
Table 1. ANOVA (one-way analysis of variance) Source of Variance DF (degree of freedom) SS (sum of square) MS (mean of square) F (F-score) Significance Regression p sum of squares regression, SSR regression mean square, MSR=SSR/R MSR/MSE p-value Residual n-p-1 sum of squares error, SSE mean square error, MSE=MSE/E Total n-1 sum of squares total, SST Table 2. RegressionStat (Regression statistics) Item Description R2 R-squared AdjustedR2 The adjusted R-squared corrected based on the degrees of freedom by comparing the sample size to the number of terms in the regression model. StdError The residual standard error/deviation corrected based on the degrees of freedom. Observations The sample size. Table 3. Coefficient Item Description factor Independent variables beta Estimated regression coefficients StdError Standard error of the regression coefficients tstat t statistic, indicating the significance of the regression coefficients Residual: the difference between each predicted value and the actual value.
alpha is a floating number representing the constant that multiplies the L1-norm. The default value is 1.0.
intercept is a Boolean variable indicating whether the regression includes the intercept. If it is true, the system automatically adds a column of "1"s to X to generate the intercept. The default value is true.
normalize is a Boolean value. If true, the regressors will be normalized before regression by subtracting the mean and dividing by the L2-norm. If intercept =false, this parameter will be ignored. The default value is false.
maxIter is a positive integer indicating the maximum number of iterations. The default value is 1000.
tolerance is a floating number. The iterations stop when the improvement in the objective function value is smaller than tolerance. The default value is 0.0001.
positive is a Boolean value indicating whether to force the coefficient estimates to be positive. The default value is false.
swColName is a STRING indicating a column name of ds. The specified column is used as the sample weight. If it is not specified, the sample weight is treated as 1.
-
If checkInput = true (default), it will check the invalid value for parameters and throw an error if the NULL value exists.
-
If checkInput = false, the invalid value is not checked.
Examples
x1=1 3 5 7 11 16 23
x2=2 8 11 34 56 54 100
y=0.1 4.2 5.6 8.8 22.1 35.6 77.2;
print(lassoBasic(y, (x1,x2), mode = 0));
// output
[-9.133706333069543,2.535935196073186,0.189298948643987]
print(lassoBasic(y, (x1,x2), mode = 1));
// output
factor beta stdError tstat pvalue
--------- ------------------ ----------------- ------------------ -----------------
intercept -9.133706333069543 5.247492365971091 -1.740584968222107 0.156730846105191
x1 2.535935196073186 1.835793667840723 1.38138356205138 0.239309472176311
x2 0.189298948643987 0.410201227095842 0.461478260277749 0.66843504931137
print(lassoBasic(y, (x1,x2), mode = 2));
// output
Coefficient->
factor beta stdError tstat pvalue
--------- ------------------ ----------------- ------------------ -----------------
intercept -9.133706333069543 5.247492365971091 -1.740584968222107 0.156730846105191
x1 2.535935196073186 1.835793667840723 1.38138356205138 0.239309472176311
x2 0.189298948643987 0.410201227095842 0.461478260277749 0.66843504931137
RegressionStat->
item statistics
------------ -----------------
R2 0.931480447323074
AdjustedR2 0.897220670984611
StdError 8.195817208870076
Observations 7
ANOVA->
Breakdown DF SS MS F Significance
---------- -- -------------------- -------------------- ------------------ -----------------
Regression 2 4165.242566095043912 2082.621283047521956 31.004574440904473 0.003672076469395
Residual 4 268.685678884843582 67.171419721210895
Total 6 4471.637142857141952
Residual->
[6.319173239708383,4.21150915569809,-0.028258082380245,-6.254004293338318,-7.262321947798779,-6.063400030876729,9.077301958987561]