High Availability

DolphinDB offers high availability for metadata and data, and DolphinDB APIs support high availability for API clients. If any node becomes unavailable, the database can continue working without any operation interrupted.

High Availability of Data

To ensure security and high availability of data, DolphinDB supports storing multiple replicas of data chunks on different servers. DolphinDB also adopts a two-phase commit (2PC) protocol to achieve strong consistency among data replicas.If data on one machine is corrupted, the database can still be accessed by visiting other machines. In cases where replicas become inconsistent due to data corruption or other issues, the system uses a recovery mechanism for replica consistency.

The number of replicas can be specified with the parameter dfsReplicationFactor in controller.cfg. Setting the number of replicas to 2 is recommended.

High Availability of Metadata

Metadata is stored in controllers. We can deploy multiple controllers in a cluster to ensure that metadata service is uninterrupted. All controllers in a cluster form a Raft group. One controller is the leader and the rest of the controllers are the followers. Data nodes can only interact with the leader. The leader writes logs locally and sends synchronization requests to all followers. Once a majority of nodes have synchronized the logs, the leader instructs the followers to change the status of related transactions to commited, ensuring strong consistency of metadata across controllers. If the leader is not available, a new leader is immediately elected to provide the metadata service. The Raft group can tolerate the failure of less than half of the controllers. For example, 1 cluster with 3 controllers can tolerate 1 unavailable controller, and 1 cluster with 5 controllers can tolerate 2 unavailable controllers. To enable high availability of metadata, the number of controllers is at least 3, and the number of replicas must be specified to be more than 1 with the configuration parameter dfsReplicationFactor in controller.cfg.

High Availability For API Clients

DolphinDB APIs incorporate auto-reconnect and failover mechanisms to maintain high availability when interacting with DolphinDB server. When a data node or compute node which an API is connecting with becomes unavailable, the API will attempt to reconnect to the node. If the attempt fails, the API will automatically connect to another available data node or compute node.