xgboost

XGBoost (eXtreme Gradient Boosting) is an open-source machine learning library used for building GBDT (Gradient Boosting Decision Trees) models. DolphinDB XGBoost plugin offers methods for model training and prediction with given DolphinDB tables. You can also use the methods to save or load the trained models.

Currently, the plugin only supports XGBoost 1.2 and 2.0. Due to differences in default parameter settings, the computation results may vary.

Installation (with installPlugin)

Required server version: DolphinDB 2.00.10 or higher

Supported OS:

  • XGBoost 1.2: Linux x86-64, Windows x86-64 JIT.
  • XGBoost 2.0: Linux x86-64 ABI=1.

Installation Steps:

(1) Use listRemotePlugins to check plugin information in the plugin repository.

Note: For plugins not included in the provided list, you can install through precompiled binaries or compile from source. These files can be accessed from our GitHub repository by switching to the appropriate version branch.

login("admin", "123456")
listRemotePlugins(, "http://plugins.dolphindb.com/plugins/")

(2) Invoke installPlugin for plugin installation

installPlugin("xgboost")

(3) Use loadPlugin to load the plugin before using the plugin methods.

loadPlugin("xgboost")

Method References

Syntax

train(Y, X, [params], [numBoostRound=10], [xgbModel])

Details

The method trains the given table or matrix and returns the trained model which can be used for further training or prediction.

Parameters

  • Y: A vector indicating the dependent variables.
  • X: A matrix or table indicating the independent variables.
  • params (optional): A dictionary representing the parameters used for XGBoost training. For more information, refer to XGBoost Docs.
  • numBoostRound (optional): A positive integer indicating the number of boosting iterations.
  • xgbModel (optional): An XGBoost model (allows training continuation). You can obtain a model with train, or load an existing model with loadModel.

predict (XGBoost 1.2)

Syntax

predict(model, X, [outputMargin=false], [ntreeLimit=0], [predLeaf=false], [predContribs=false], [training=false])

Details

The method predicts with the given table.

Parameters

  • model: An XGBoost model used for prediction. You can obtain a model with train or loadModel.
  • X: A matrix or table for prediction.
  • outputMargin (optional): A Boolean value indicating whether to output the raw untransformed margin value.
  • ntreeLimit (optional): A non-negative interger indicating which layer of trees are used in prediction. The default value is 0, indicating all trees are used.
  • predLeaf (optional): A Boolean value. When it is true, the output will be a matrix of (nsample, ntrees) with each record indicating the predicted leaf index of each sample in each tree.
  • predContribs (optional): A Boolean value. When it is true, the output will be a matrix of size (nsample, nfeats + 1) with each record indicating the feature contributions (SHAP values) for that prediction. The sum of all feature contributions is equal to the raw untransformed margin value of the prediction.
  • training (optional): A Boolean value indicating whether the prediction value is used for training.

For more information, refer to XGBoost Docs.

predict (XGBoost 2.0)

Syntax

predict(model, X, [type=0], [iterationPair], [strictShape=false], [training=false])

Details

The method predicts with the given table.

Parameters

  • model: An XGBoost model used for prediction. You can obtain a model with train or loadModel.
  • X: A matrix or table for prediction.
  • type (optional): An integer ranging from 0 to 6, indicating the prediction type:
    • 0 (default): Normal prediction
    • 1: Output margin
    • 2: Predict contribution
    • 3: Predict approximated contribution
    • 4: Predict feature interaction
    • 5: Predict approximated feature interaction
    • 6: Predict leaf "training"
  • iterationPair (optional): A pair of integers indicating the range of boosting iterations during prediction.
  • strictShape (optional): A Boolean value indicating whether the output should strictly follow a specific shape.
  • training (optional): A Boolean value indicating whether the prediction value is used for training.

saveModel

Syntax

saveModel(model, fname)

Details

The method saves the trained model to disk.

Parameters

  • model: An XGBoost model to be saved.
  • fname: A string indicating where the model is saved.

loadModel

Syntax

loadModel(fname)

Details

The method loads the model from disk.

Parameters

  • fname: A string indicating where the model is saved.

Usage Examples

Note: When using the plugin on Windows, you must specify an absolute path during loading. In the path, use “\\” or use “/” instead of “\”.

loadPlugin("path_to/PluginXgboost.txt")

// Create a table for training
t = table(1..5 as c1, 1..5 * 2 as c2, 1..5 * 3 as c3)
label = 1 2 9 28 65

// Set params
params = {objective: "reg:linear", max_depth: 5, eta: 0.1, min_child_weight: 1, subsample: 0.5, colsample_bytree: 1, num_parallel_tree: 1}

// Train the model
model = xgboost::train(label, t, params, 100)

// Predict with the model
xgboost::predict(model, t)

// Save the model
xgboost::saveModel(model, WORK_DIR + "/xgboost001.model")

// Load the model
model = xgboost::loadModel(WORK_DIR + "/xgboost001.model")

// Continue training on the model
model = xgboost::train(label, t, params, 100