lassoCV#

swordfish.function.lassoCV()#

Estimate a Lasso regression using 5-fold cross-validation and return a model corresponding to the optimal parameters.

Parameters:
  • ds (Constant) – An in-memory table or a data source usually generated by the sqlDS function.

  • yColName (Constant) – A string indicating the column name of the dependent variable in ds.

  • xColNames (Constant) – A string scalar/vector indicating the column names of the independent variables in ds.

  • alphas (Constant, optional) – A floating-point scalar or vector that represents the coefficient multiplied by the L1 norm penalty term. The default value is [0.01, 0.1, 1.0].

  • intercept (Constant) – A Boolean value indicating whether to include the intercept in the regression. The default value is true.

  • normalize (Constant) – A Boolean value. If true, the regressors will be normalized before regression by subtracting the mean and dividing by the L2-norm. If intercept =false, this parameter will be ignored. The default value is false.

  • maxIter (Constant) – A positive integer indicating the maximum number of iterations. The default value is 1000.

  • tolerance (Constant) – A floating number. The iterations stop when the improvement in the objective function value is smaller than tolerance. The default value is 0.0001.

  • positive (Constant) – A Boolean value indicating whether to force the coefficient estimates to be positive. The default value is false.

  • swColName (Constant) – A STRING indicating a column name of ds. The specified column is used as the sample weight. If it is not specified, the sample weight is treated as 1.

  • checkInput (Constant) –

    A BOOLEAN value. It determines whether to enable validation check for parameters yColName, xColNames, and swColName.

    • If checkInput = true (default), it will check the invalid value for parameters and throw an error if the null value exists.

    • If checkInput = false, the invalid value is not checked.

    It is recommended to specify checkInput = true. If it is false, it must be ensured that there are no invalid values in the input parameters and no invalid values are generated during intermediate calculations, otherwise the returned model may be inaccurate.

Returns:

A dictionary containing the following keys

  • modelName: the model name, which is “LassoCV” for this method

  • coefficients: the regression coefficients

  • intercept: the intercept

  • dual_gap: the dual gap

  • tolerance: the tolerance for the optimization

  • iterations: the number of iterations

  • xColNames: the column names of the independent variables in the data source

  • predict: the function used for prediction

  • alpha: the penalty term for cross-validation

Return type:

Constant