glm
Syntax
glm(ds, yColName, xColNames, [family], [link], [tolerance=1e-6],
[maxIter=100])
Arguments
ds is the data source to be trained. It can be generated with function sqlDS.
yColName is a string indicating the dependent variable column.
xColNames is a STRING scalar/vector indicating the names of the indepenent variable columns.
family (optional) is a string indicating the type of distribution. It can be gaussian (default), poisson, gamma, inverseGuassian or binomial.
link (optional) is a string indicating the type of the link function.
Possible values of link and the dependent variable for each family:
family | link | default link | dependent variable |
---|---|---|---|
gaussian | identity, inverse, log | identity | DOUBLE type |
poisson | log, sqrt, identity | log | non-negative integer |
gamma | inverse, identity, log | inverse | y>=0 |
inverseGaussian | nverseOfSquare, inverse, identity, log | inverseOfSquare | y>=0 |
binomial | logit, probit | logit | y=0,1 |
tolerance (optional) is a numeric scalar. The iterations stops if the difference in the value of the log likelihood functions of 2 adjacent iterations is smaller than tolerance. The default value is 0.000001.
maxIter (optional) is a positive integer indicating the maximum number of iterations. The default value is 100.
Details
Fit a generalized linear model. The result is a dictionary with the following keys: coefficients, link, tolerance, family, xColNames, tolerance, modelName, residualDeviance, iterations and dispersion.
- coefficients is a table with the coefficient estimate, standard deviation, t value and p value for each coefficient;
- modelName is "Generalized Linear Model";
- iterations is the number of iterations;
- dispersion is the dispersion coefficient of the model.
Examples
Fit a generalized linear model model with simulated data:
x1 = rand(100.0, 100)
x2 = rand(100.0, 100)
b0 = 6
b1 = 1
b2 = -2
err = norm(0, 10, 100)
y = b0 + b1 * x1 + b2 * x2 + err
t = table(x1, x2, y)
model = glm(sqlDS(<select * from t>), `y, `x1`x2, `gaussian, `identity);
model;
// output
coefficients->
beta stdError tstat pvalue
-------- -------- ---------- --------
1.027483 0.032631 31.487543 0
-1.99913 0.03517 -56.842186 0
5.260677 2.513633 2.092858 0.038972
link->identity
tolerance->1.0E-6
family->gaussian
xColNames->["x1","x2"]
modelName->Generalized Linear Model
residualDeviance->8873.158697
iterations->5
dispersion->91.475863
Use the fitted model in forecasting:
predict(model, t);
Save the fitted model to disk:
saveModel(model, "C:/DolphinDB/Data/GLMModel.txt");
Load a saved model:
loadModel("C:/DolphinDB/Data/GLMModel.txt");