# glm {#glm}

**Parent topic:**[Functions](../../Functions/category.md)

## Syntax {#syntax}

`glm(ds, yColName, xColNames, [family], [link], [tolerance=1e-6], [maxIter=100])`

## Arguments {#arguments}

**ds** is the data source to be trained. It can be generated with function [sqlDS](../s/sqlDS.md).

**yColName** is a string indicating the dependent variable column.

**xColNames** is a STRING scalar/vector indicating the names of the indepenent variable columns.

**family** \(optional\) is a string indicating the type of distribution. It can be gaussian \(default\), poisson, gamma, inverseGaussian or binomial.

**link** \(optional\) is a string indicating the type of the link function.

Possible values of *link* and the dependent variable for each *family*:

|family|link|default link|dependent variable|
|------|----|------------|------------------|
|gaussian|identity, inverse, log|identity|DOUBLE type|
|poisson|log, sqrt, identity|log|non-negative integer|
|gamma|inverse, identity, log|inverse|y&gt;=0|
|inverseGaussian|inverseOfSquare, inverse, identity, log|inverseOfSquare|y&gt;=0|
|binomial|logit, probit|logit|y=0,1|

**tolerance** \(optional\) is a numeric scalar. The iterations stops if the difference in the value of the log likelihood functions of 2 adjacent iterations is smaller than tolerance. The default value is 0.000001.

**maxIter** \(optional\) is a positive integer indicating the maximum number of iterations. The default value is 100.

## Details {#details}

Fit a generalized linear model. The result is a dictionary with the following keys: coefficients, link, tolerance, family, xColNames, tolerance, modelName, residualDeviance, iterations and dispersion.

-   coefficients is a table with the coefficient estimate, standard deviation, t value and p value for each coefficient;
-   modelName is "Generalized Linear Model";
-   iterations is the number of iterations;
-   dispersion is the dispersion coefficient of the model.

## Examples {#examples}

Fit a generalized linear model model with simulated data:

```
x1 = rand(100.0, 100)
x2 = rand(100.0, 100)
b0 = 6
b1 = 1
b2 = -2
err = norm(0, 10, 100)
y = b0 + b1 * x1 + b2 * x2 + err
t = table(x1, x2, y)
model = glm(sqlDS(<select * from t>), `y, `x1`x2, `gaussian, `identity);
model;

/* output:
coefficients->

beta     stdError tstat      pvalue
-------- -------- ---------- --------
1.027483 0.032631 31.487543  0
-1.99913 0.03517  -56.842186 0
5.260677 2.513633 2.092858   0.038972

link->identity
tolerance->1.0E-6
family->gaussian
xColNames->["x1","x2"]
modelName->Generalized Linear Model
residualDeviance->8873.158697
iterations->5
dispersion->91.475863
*/
```

Use the fitted model in forecasting:

```
predict(model, t);
```

Save the fitted model to disk:

```
saveModel(model, "C:/DolphinDB/Data/GLMModel.txt");
```

Load a saved model:

```
loadModel("C:/DolphinDB/Data/GLMModel.txt");
```

