logisticRegression
Syntax
logisticRegression(ds, yColName, xColNames, [intercept=true], [initTheta],
[tolerance=1e-3], [maxIter=500], [regularizationCoeff=1.0])
Arguments
ds is the data source to be trained. It can be generated with function sqlDS.
yColName is a string indicating the category column name.
xColNames is a string scalar/vector indicating the names of independent variables.
intercept is a Boolean scalar indicating whether the regression uses an intercept. The default value is true, which means that a column of 1s is added to the independent variables.
initTheta is a vector indicating the initial values of the parameters when the iterations begin. The default value is a vector of zeroes with the length of xColNames.size()+intercept.
tolerance is a numeric scalar. If the difference in the value of the log likelihood functions of 2 adjacent iterations is smaller than tolerance, the iterations would stop. The default value is 0.001.
maxIter is a positive integer indicating the maximum number of iterations. The iterations will stop if the number of iterations reaches maxIter. The default value is 500.
regularizationCoeff is a positive number indicating the coefficient of the regularization term. The default value is 1.0.
intercept, initTheta, tolerance, maxIter, regularizationCoeff are optional.
Details
Fit a logistic regression model. The result is a dictionary with the following keys: iterations, modelName, coefficients, tolerance, logLikelihood, xColNames and intercept. iterations is the number of iterations, modelName is "Logistic Regression", coefficients is a vector of the parameter estimates, logLikelihood is the final value of the log likelihood function.
The fitted model can be used as an input for function predict.
Examples
Fit a logistic regression model with simulated data:
t = table(100:0, `y`x0`x1, [INT,DOUBLE,DOUBLE])
y = take(0, 50)
x0 = norm(-1.0, 1.0, 50)
x1 = norm(-1.0, 1.0, 50)
insert into t values (y, x0, x1)
y = take(1, 50)
x0 = norm(1.0, 1.0, 50)
x1 = norm(1.0, 1.0, 50)
insert into t values (y, x0, x1)
model = logisticRegression(sqlDS(<select * from t>), `y, `x0`x1);
// output
modelName->Logistic Regression
logLikelihood->-23.269132
intercept->true
coefficients->[1.377971,1.914001,-0.305114]
xColNames->[x0,x1]
iterations->7
tolerance->0.001
Use the fitted model in forecasting:
predict(model, t);
Save the fitted model to disk:
saveModel(model, "C:/DolphinDB/data/logisticModel.txt");
Load a saved model:
loadModel("C:/DolphinDB/data/logisticModel.txt");