randomForestClassifier#
- swordfish.function.randomForestClassifier()#
Fit a random forest classification model. The result is a dictionary with the following keys: numClasses, minImpurityDecrease, maxDepth, numBins, numTress, maxFeatures, model, modelName and xColNames. model is a tuple with the result of the trained trees; modelName is “Random Forest Classifier”.
The fitted model can be used as an input for function predict .
- Parameters:
ds (Constant) – The data sources to be trained. It can be generated with function sqlDS.
yColName (Constant) – A string indicating the category column.
xColNames (Constant) – A string scalar/vector indicating the names of the feature columns.
numClasses (Constant) – A positive integer indicating the number of categories in the category column. The value of the category column must be integers in [0, numClasses).
maxFeatures (Constant, optional) –
Aan integer or a floating number indicating the number of features to consider when looking for the best split. The default value is 0.
if maxFeatures is a positive integer, then consider maxFeatures features at each split.
if maxFeatures is 0, then sqrt(the number of feature columns) features are considered at each split.
if maxFeatures is a floating number between 0 and 1, then int(maxFeatures * the number of feature columns) features are considered at each split.
numTrees (Constant, optional) – A positive integer indicating the number of trees in the random forest. The default value is 10.
numBins (Constant, optional) – A positive integer indicating the number of bins used when discretizing continuous features. The default value is 32. Increasing numBins allows the algorithm to consider more split candidates and make fine-grained split decisions. However, it also increases computation and communication time.
maxDepth (Constant, optional) – A positive integer indicating the maximum depth of a tree. The default value is 32.
minImpurityDecrease (Constant, optional) – A node will be split if this split induces a decrease of the Gini impurity greater than or equal to this value. The default value is 0.
numJobs (Constant, optional) – An integer indicating the maximum number of concurrently running jobs if set to a positive number. If set to -1, all CPU threads are used. If set to another negative integer, (the number of all CPU threads + numJobs + 1) threads are used.
randomSeed (Constant, optional) – The seed used by the random number generator.