randomForestRegressor#

swordfish.function.randomForestRegressor()#

Fit a random forest regression model. The result is a dictionary with the following keys: minImpurityDecrease, maxDepth, numBins, numTress, maxFeatures, model, modelName and xColNames. model is a tuple with the result of the trained trees; modelName is “Random Forest Regressor”.

The fitted model can be used as an input for function predict .

Parameters:
  • ds (Constant) – The data sources to be trained. It can be generated with function sqlDS.

  • yColName (Constant) – A string indicating the dependent variable column.

  • xColNames (Constant) – A string scalar/vector indicating the names of the feature columns.

  • maxFeatures (Constant, optional) –

    An integer or a floating number indicating the number of features to consider when looking for the best split. The default value is 0.

    • if maxFeatures is a positive integer, then consider maxFeatures features at each split.

    • if maxFeatures is 0, then sqrt(the number of feature columns) features are considered at each split.

    • if maxFeatures is a floating number between 0 and 1, then int(maxFeatures * the number of feature columns) features are considered at each split.

  • numTrees (Constant, optional) – A positive integer indicating the number of trees in the random forest. The default value is 10.

  • numBins (Constant, optional) – positive integer indicating the number of bins used when discretizing continuous features. The default value is 32. Increasing numBins allows the algorithm to consider more split candidates and make fine-grained split decisions. However, it also increases computation and communication time.

  • maxDepth (Constant, optional) – A positive integer indicating the maximum depth of a tree. The default value is 32.

  • minImpurityDecrease (Constant, optional) – A node will be split if this split induces a decrease of impurity greater than or equal to this value. The default value is 0.

  • numJobs (Constant, optional) – An integer indicating the maximum number of concurrently running jobs if set to a positive number. If set to -1, all CPU threads are used. If set to another negative integer, (the number of all CPU threads + numJobs + 1) threads are used.

  • randomSeed (Constant, optional) – The seed used by the random number generator.