# randomForestRegressor {#randomforestregressor}

**Parent topic:**[Functions](../../Functions/category.md)

## Syntax {#syntax}

`randomForestRegressor(ds, yColName, xColNames, [maxFeatures=0], [numTrees=10], [numBins=32], [maxDepth=32], [minImpurityDecrease=0.0], [numJobs=-1], [randomSeed])`

## Arguments {#arguments}

**ds** is the data sources to be trained. It can be generated with function [sqlDS](../s/sqlDS.md).

**yColName** is a string indicating the dependent variable column.

**xColNames** is a string scalar/vector indicating the names of the feature columns.

**maxFeatures** \(optional\) is an integer or a floating number indicating the number of features to consider when looking for the best split. The default value is 0.

-   if *maxFeatures* is a positive integer, then consider maxFeatures features at each split.

-   if *maxFeatures* is 0, then sqrt\(the number of feature columns\) features are considered at each split.

-   if *maxFeatures* is a floating number between 0 and 1, then int\(*maxFeatures* \* the number of feature columns\) features are considered at each split.


**numTrees** \(optional\) is a positive integer indicating the number of trees in the random forest. The default value is 10.

**numBins** \(optional\) is a positive integer indicating the number of bins used when discretizing continuous features. The default value is 32. Increasing numBins allows the algorithm to consider more split candidates and make fine-grained split decisions. However, it also increases computation and communication time.

**maxDepth** \(optional\) is a positive integer indicating the maximum depth of a tree. The default value is 32.

**minImpurityDecrease** \(optional\) a node will be split if this split induces a decrease of impurity greater than or equal to this value. The default value is 0.

**numJobs** \(optional\) is an integer indicating the maximum number of concurrently running jobs if set to a positive number. If set to -1, all CPU threads are used. If set to another negative integer, \(the number of all CPU threads + numJobs + 1\) threads are used.

**randomSeed** \(optional\) is the seed used by the random number generator.

## Details {#details}

Fit a random forest regression model. The result is a dictionary with the following keys: minImpurityDecrease, maxDepth, numBins, numTress, maxFeatures, model, modelName and xColNames. model is a tuple with the result of the trained trees; modelName is "Random Forest Regressor".

The fitted model can be used as an input for function [predict](../p/predict.md) .

## Examples {#examples}

Fit a random forest regression model with simulated data:

```
x1 = rand(100.0, 100)
x2 = rand(100.0, 100)
b0 = 6
b1 = 1
b2 = -2
err = norm(0, 10, 100)
y = b0 + b1 * x1 + b2 * x2 + err
t = table(x1, x2, y)
model = randomForestRegressor(sqlDS(<select * from t>), `y, `x1`x2)
yhat=predict(model, t);

plot(y, yhat, ,SCATTER);
```

Save the trained model to disk:

```
saveModel(model, "C:/DolphinDB/Data/regressionModel.txt");
```

Load a saved model:

```
model=loadModel("C:/DolphinDB/Data/regressionModel.txt");
```

