xgboost
XGBoost (eXtreme Gradient Boosting) is an open-source machine learning library used for building GBDT (Gradient Boosting Decision Trees) models. DolphinDB XGBoost plugin offers methods for model training and prediction with given DolphinDB tables. You can also use the methods to save or load the trained models.
Currently, the plugin only supports XGBoost 1.2 and 2.0. Due to differences in default parameter settings, the computation results may vary.
Installation (with installPlugin
)
Required server version: DolphinDB 2.00.10 or higher
Supported OS:
- XGBoost 1.2: Linux x86-64, Windows x86-64 JIT.
- XGBoost 2.0: Linux x86-64 ABI=1.
Installation Steps:
(1) Use listRemotePlugins to check plugin information in the plugin repository.
Note: For plugins not included in the provided list, you can install through precompiled binaries or compile from source. These files can be accessed from our GitHub repository by switching to the appropriate version branch.
login("admin", "123456")
listRemotePlugins(, "http://plugins.dolphindb.com/plugins/")
(2) Invoke installPlugin for plugin installation
installPlugin("xgboost")
(3) Use loadPlugin to load the plugin before using the plugin methods.
loadPlugin("xgboost")
Method References
Syntax
train(Y, X, [params], [numBoostRound=10], [xgbModel])
Details
The method trains the given table or matrix and returns the trained model which can be used for further training or prediction.
Parameters
- Y: A vector indicating the dependent variables.
- X: A matrix or table indicating the independent variables.
- params (optional): A dictionary representing the parameters used for XGBoost training. For more information, refer to XGBoost Docs.
- numBoostRound (optional): A positive integer indicating the number of boosting iterations.
- xgbModel (optional): An XGBoost model (allows training continuation). You can obtain a model with
train
, or load an existing model withloadModel
.
predict (XGBoost 1.2)
Syntax
predict(model, X, [outputMargin=false], [ntreeLimit=0], [predLeaf=false], [predContribs=false], [training=false])
Details
The method predicts with the given table.
Parameters
- model: An XGBoost model used for prediction. You can obtain a model with
train
orloadModel
. - X: A matrix or table for prediction.
- outputMargin (optional): A Boolean value indicating whether to output the raw untransformed margin value.
- ntreeLimit (optional): A non-negative interger indicating which layer of trees are used in prediction. The default value is 0, indicating all trees are used.
- predLeaf (optional): A Boolean value. When it is true, the output will be a matrix of (nsample, ntrees) with each record indicating the predicted leaf index of each sample in each tree.
- predContribs (optional): A Boolean value. When it is true, the output will be a matrix of size (nsample, nfeats + 1) with each record indicating the feature contributions (SHAP values) for that prediction. The sum of all feature contributions is equal to the raw untransformed margin value of the prediction.
- training (optional): A Boolean value indicating whether the prediction value is used for training.
For more information, refer to XGBoost Docs.
predict (XGBoost 2.0)
Syntax
predict(model, X, [type=0], [iterationPair], [strictShape=false], [training=false])
Details
The method predicts with the given table.
Parameters
- model: An XGBoost model used for prediction. You can obtain a model with
train
orloadModel
. - X: A matrix or table for prediction.
- type (optional): An integer ranging from 0 to 6, indicating the prediction type:
- 0 (default): Normal prediction
- 1: Output margin
- 2: Predict contribution
- 3: Predict approximated contribution
- 4: Predict feature interaction
- 5: Predict approximated feature interaction
- 6: Predict leaf "training"
- iterationPair (optional): A pair of integers indicating the range of boosting iterations during prediction.
- strictShape (optional): A Boolean value indicating whether the output should strictly follow a specific shape.
- training (optional): A Boolean value indicating whether the prediction value is used for training.
saveModel
Syntax
saveModel(model, fname)
Details
The method saves the trained model to disk.
Parameters
- model: An XGBoost model to be saved.
- fname: A string indicating where the model is saved.
loadModel
Syntax
loadModel(fname)
Details
The method loads the model from disk.
Parameters
- fname: A string indicating where the model is saved.
Usage Examples
Note: When using the plugin on Windows, you must specify an absolute path during loading. In the path, use “\\” or use “/” instead of “\”.
loadPlugin("path_to/PluginXgboost.txt")
// Create a table for training
t = table(1..5 as c1, 1..5 * 2 as c2, 1..5 * 3 as c3)
label = 1 2 9 28 65
// Set params
params = {objective: "reg:linear", max_depth: 5, eta: 0.1, min_child_weight: 1, subsample: 0.5, colsample_bytree: 1, num_parallel_tree: 1}
// Train the model
model = xgboost::train(label, t, params, 100)
// Predict with the model
xgboost::predict(model, t)
// Save the model
xgboost::saveModel(model, WORK_DIR + "/xgboost001.model")
// Load the model
model = xgboost::loadModel(WORK_DIR + "/xgboost001.model")
// Continue training on the model
model = xgboost::train(label, t, params, 100