glm
Syntax
glm(ds, yColName, xColNames, [family], [link], [tolerance=1e-6], [maxIter=100])
Arguments
ds is the data source to be trained. It can be generated with function sqlDS.
yColName is a string indicating the dependent variable column.
xColNames is a string scalar/vector indicating the names of the indepenent variable columns.
family is a string scalar indicating the type of distribution. It can be gaussian, poisson, gamma, inverseGuassian or binomial. The default value is gaussian.
link is a string scalar indicating the type of the link function. The default value for each family is shown in the table below.
tolerance is a numeric scalar. The iterations stops if the difference in the value of the log likelihood functions of 2 adjacent iterations is smaller than tolerance. The default value is 0.000001.
maxIter is a positive integer indicating the maximum number of iterations. The default value is 100.
Possible values of link and the dependent variable for each family:
family |
link |
default link |
dependent variable |
---|---|---|---|
gaussian |
identity, inverse, log |
identity |
floating |
poisson |
log, sqrt, identity |
log |
non-negative integer |
gamma |
inverse, identity, log |
inverse |
y>=0 |
inverseGaussian |
nverseOfSquare, inverse, identity, log |
inverseOfSquare |
y>=0 |
binomial |
logit, probit |
logit |
y=0,1 |
Details
Fit a generalized linear model. The result is a dictionary with the following keys: coefficients, link, tolerance, family, xColNames, tolerance, modelName, residualDeviance, iterations and dispersion. coefficients is a table with the coefficient estimate, standard deviation, t value and p value for each coefficient; modelName is “Generalized Linear Model”; iterations is the number of iterations; dispersion is the dispersion coefficient of the model.
Examples
Fit a generalized linear model model with simulated data:
$ x1 = rand(100.0, 100)
$ x2 = rand(100.0, 100)
$ b0 = 6
$ b1 = 1
$ b2 = -2
$ err = norm(0, 10, 100)
$ y = b0 + b1 * x1 + b2 * x2 + err
$ t = table(x1, x2, y)
$ model = glm(sqlDS(<select * from t>), `y, `x1`x2, `gaussian, `identity);
$ model;
coefficients->
beta stdError tstat pvalue
-------- -------- ---------- --------
1.027483 0.032631 31.487543 0
-1.99913 0.03517 -56.842186 0
5.260677 2.513633 2.092858 0.038972
link->identity
tolerance->1.0E-6
family->gaussian
xColNames->["x1","x2"]
modelName->Generalized Linear Model
residualDeviance->8873.158697
iterations->5
dispersion->91.475863
Use the fitted model in forecasting:
$ predict(model, t);
Save the fitted model to disk:
$ saveModel(model, "C:/DolphinDB/Data/GLMModel.txt");
Load a saved model:
$ loadModel("C:/DolphinDB/Data/GLMModel.txt");