Real-Time Cloud-Edge Data Synchronization with DolphinDB

The widespread deployment of edge sensors has made real-time data synchronization between edge devices and the cloud a critical component for efficient system operation and intelligent decision-making. Real-time synchronization enables edge devices to transmit large volumes of data to the cloud for storage, analysis, and processing. This capability is vital across various domains, including industrial IoT, intelligent transportation, smart cities, healthcare, and smart homes. By enabling rapid response and real-time feedback, real-time data synchronization improves system responsiveness and enhances both user experience and service quality.

In cloud-edge synchronization scenarios, DolphinDB—a high-performance distributed time-series database and real-time computing platform—offers unique advantages.

  • DolphinDB delivers exceptional real-time data processing capabilities, efficiently handling large-scale data streams from edge devices and synchronizing the results to the cloud with minimal latency. It supports data compression during transmission to improve synchronization efficiency. This is particularly valuable in latency-sensitive applications such as predictive maintenance in industrial systems, real-time traffic control, or smart home automation.
  • Its distributed architecture and elastic scalability enable deployment across a wide range of environments—from lightweight edge devices to large-scale cloud clusters—and support hybrid deployments across both cloud and edge, ensuring fast and reliable data synchronization.
  • DolphinDB also supports local data storage at the edge, enhancing data security and availability. Its lightweight and flexible development model simplifies deployment and maintenance across cloud and edge environments.

DolphinDB enables end-to-end data management from edge to cloud, combining efficient synchronization with real-time analytics. This capability facilitates digital transformation and accelerates innovation across IoT and distributed application scenarios.

1. DolphinDB Real-Time Synchronization Architecture

DolphinDB implements a modular, extensible stream processing architecture designed for real-time computation and analytics on continuous data streams. Unlike traditional batch processing, DolphinDB performs incremental processing as data arrives, following the time-series nature of the stream. This significantly reduces processing latency, making it ideal for real-time synchronization.

The architecture introduces the concept of stream tables, which are integrated with a classic publish/subscribe (Pub/Sub) model. Upon insertion into a stream table, data is immediately published to a message queue, and publisher threads dispatch it to subscribers. This mechanism ensures real-time data delivery across distributed nodes.

The synchronization solution described in this document is built on these stream processing capabilities. In DolphinDB, subscriptions can be local or remote. Local subscriptions process and consolidate data at the edge, which can then trigger synchronization to the cloud using the remoteRun function.

The remoteRun function in DolphinDB is designed for executing functions or transferring data between distributed DolphinDB nodes. In cloud-edge scenarios, edge nodes use remoteRun to invoke computation remotely on the cloud server, passing serialized function definitions and parameters.

To optimize network usage, remoteRun supports data compression. Two algorithms are currently available:

  • lz4: Faster decompression, lower compression ratio.

  • zstd: Higher compression ratio, slower decompression.

By leveraging stream tables for real-time data collection and distribution, and combining them with local subscription and remoteRun calls, the solution ensures low-latency, resource-efficient data synchronization. All compression, transmission, and function orchestration are handled on the edge side, allowing the cloud to maintain a lightweight processing load.

This end-to-end synchronization mechanism enables high-throughput, low-latency data integration across edge and cloud, making DolphinDB a robust solution for real-time, intelligent IoT applications.

2. Synchronization Implementation Example

This example demonstrates a cloud-edge synchronization scenario for inspection data collected by on-site robots. To comprehensively evaluate the functionality and performance of the synchronization mechanism, various data volumes and synchronization scenarios were simulated using DolphinDB scripts. The tests cover both real-time data synchronization and end-of-month batch synchronization, with the volume of each test ranging from 4,800 to 14.4 million records.

The simulated dataset represents 30 days of activity from 100 robots, totaling 14.4 million records (approximately 3.3 GB). The tests assessed transmission efficiency across different data sizes, with all records generated and synchronized in real time. Stream tables with built-in message queuing mechanisms were used for data ingestion and distribution. Robot inspection data was written to the stream table and then pushed to subscribers in real time.

To optimize storage and query performance, the receiving end used partitioned distributed tables. Following DolphinDB’s performance best practices and considering future scalability, data was partitioned by date. For details on partitioning strategies, refer to Partitioned Database Design and Operations.

For this dataset, the detect_time column was used as the partitioning key, dividing data into 30 date-based partitions. The device_id and detect_time were set as the sort columns to improve query efficiency. The modeling strategy is as follows:

if (existsDatabase("dfs://Robot")) { dropDatabase("dfs://Robot") }
CREATE DATABASE "dfs://Robot" PARTITIONED BY VALUE(2020.06.01..2026.06.30) ENGINE="TSDB"
CREATE TABLE "dfs://Robot"."data" (
    device_id SYMBOL,
    error BOOL,
    detect_time TIMESTAMP,
    temp DOUBLE,
    humidity DOUBLE,
    ......
)
PARTITIONED BY detect_time
sortColumns=`device_id`detect_time            

For the complete data generation script and field descriptions, see Appendix 1.

3. Performance Evaluation

3.1 Test Environment

The test was conducted using two physical servers:

  • One acted as the cloud node, responsible for receiving synchronized data.
  • The other acted as the edge node, responsible for generating and sending data.

The hardware specifications and DolphinDB configurations are detailed in Tables 3-1 and 3-2.

Host IP Role
10.0.0.80 Simulated cloud device
10.0.0.81 Simulated edge device
Figure 1. Table 3-1 Device Roles
Simulated Role Quantity Physical Setup Configuration
Cloud Node 1 1 Physical Server OS: Centos 7.6
CPU: Intel® Xeon® Silver 4214 @ 2.20GHz (48 logical cores with Hyper-Threading)
Mem: 188 GB
DolphinDB: 3.00.0.1
Edge Node 1 1 Physical Server Disk: 2T SSD * 2 500M/s; 2T HDD*1 250M/s
Net: 100Mbps
DolphinDB: 3.00.0.1
Figure 2. Table 3-2 Hardware and Software Configuration

The cloud node is configured as follows:

localSite=localhost:8848:local8848
mode=single
maxMemSize=32
maxConnections=512
workerNum=4
localExecutors=3
maxBatchJobWorker=4
dataSync=1
OLAPCacheEngineSize=2
TSDBCacheEngineSize=1
newValuePartitionPolicy=add
maxPubConnections=64
subExecutors=4
perfMonitoring=true
lanCluster=0
subThrottle=1
persistenceWorkerNum=1
maxPubQueueDepthPerSite=100000000

To simulate the resource-constrained nature of edge devices, the edge node in this test is configured with 4 CPU cores and 8 GB of memory. The edge node configuration is detailed below:

localSite=localhost:8848:local8848
mode=single
maxMemSize=8
maxConnections=512
workerNum=4
localExecutors=3
maxBatchJobWorker=4
dataSync=1
OLAPCacheEngineSize=2
TSDBCacheEngineSize=2
newValuePartitionPolicy=add
maxPubConnections=64
subExecutors=4
perfMonitoring=true
lanCluster=0
subThrottle=1
persistenceWorkerNum=1

3.2 Real-Time Synchronization Performance Testing

This section evaluates DolphinDB’s synchronization performance across varying data volumes, as well as its CPU, memory, and network bandwidth usage. The test covers synchronization of datasets ranging from 4,800 to 14,400,000 records, representing most real-world scenarios. Additionally, the resource consumption and transmission efficiency under different compression algorithms (zstd and lz4) when using the remoteRun function are analyzed to identify the optimal scheme for different conditions.

Data Volume (rows) Compression Method Elapsed Time (ms) Average Sync Throughput (records/s) CPU Usage Memory Usage (GB) Network Throughput (MB/s)
4,800 lz4 60 80,000 103% 0.6 0.04
zstd 62 77,419 105% 0.6 0.04
uncompressed 74 64,864 105% 0.6 0.06
240,000 lz4 1,883 127,456 120% 0.7 1.22
zstd 1,720 139,535 120% 0.7 0.94
uncompressed 2,853 84,122 120% 0.7 2.12
1,920,000 lz4 14,384 133,482 140% 1.1 9.5
zstd 13,208 145,366 140% 1.1 7.93
uncompressed 22,362 85,859 145% 1.1 11.5
14,400,000 lz4 108,529 132,683 140% 1.7 10.1
zstd 98,080 146,819 140% 1.8 8.85
uncompressed 167,579 85,930 145% 1.3 11.4
Figure 3. Table 3-3 Summary of Test Results

Note: All resource usage figures in the table represent peak values.

DolphinDB demonstrates efficient synchronization even at large data volumes while maintaining low CPU and memory consumption, effectively conserving cloud resources.

The results clearly show that compressed transmission outperforms uncompressed in both efficiency and synchronization speed. Comparing lz4 and zstd compression reveals that zstd achieves a higher compression ratio but requires longer decompression time. For smaller datasets, the decompression overhead causes zstd’s transmission efficiency to lag behind lz4. However, as data volume increases, zstd becomes increasingly faster than lz4. At very small volumes (4,800 records), all transmission methods exhibit similar throughput and resource consumption.

Comparing uncompressed with lz4 compressed transmission, network throughput peaks approach the bandwidth limits of the test environment. To further improve synchronization speed, increased resource allocation and higher network bandwidth are necessary. Uncompressed transmission requires sending the entire dataset without reduction, resulting in longer transfer times than the combined compression and decompression process, thus yielding slower overall speed.

For the zstd method, synchronization speed is primarily limited by the subscription and dispatch rate rather than network bandwidth. Therefore, improving performance for this mode requires upgrading the DolphinDB configuration.

Detailed implementation and testing procedures are described in the appendix.

4. Conclusion

The real-time synchronization approach presented in this document leverages DolphinDB’s compression-based transmission to achieve efficient data transfer while minimizing memory and network usage. It significantly reduces the cloud's computational and bandwidth load, offering a cloud-lightweight and efficient synchronization solution.

However, this approach imposes certain operational and implementation requirements. It involves capturing real-time data on the edge, packaging it, and transmitting it to the cloud via the remoteRun function. Preprocessing on the edge enables edge computing, which helps offload processing from the cloud. This increases the operational burden on edge-side personnel, who must be familiar with cloud-side data processing logic and proficient in using DolphinDB functions.

Overall, this synchronization strategy is best suited for scenarios where edge devices have sufficient idle resources, while cloud resources and network bandwidth are constrained.

Appendix

Appendix 1: Simulated Data Field Description

Field Name Data Type Description
device_id LONG Robot device ID
error BOOL Error status
detect_time TIMESTAMP Data reporting time
temp DOUBLE Temperature
humidity DOUBLE Humidity
temp_max DOUBLE Maximum temperature recorded
humidity_max DOUBLE Maximum humidity recorded
trace INT Preset operating path of the device
position SYMBOL Current position of the device
point_one BOOL Whether point 1 is reached
point_two BOOL Whether point 2 is reached
point_three BOOL Whether point 3 is reached
point_four BOOL Whether point 4 is reached
point_five BOOL Whether point 5 is reached
point_six BOOL Whether point 6 is reached
point_seven BOOL Whether point 7 is reached
point_eight BOOL Whether point 8 is reached
point_nine BOOL Whether point 9 is reached
point_ten BOOL Whether point 10 is reached
responsible SYMBOL Person in charge
create_time TIMESTAMP Record creation time
rob_type INT Robot type
rob_sn SYMBOL Robot serial number
rob_create TIMESTAMP Robot creation time
rob_acc1 DOUBLE Robot accuracy metric 1
rob_acc2 DOUBLE Robot accuracy metric 2
rob_acc3 DOUBLE Robot accuracy metric 3
rob_acc4 DOUBLE Robot accuracy metric 4
rob_acc5 DOUBLE Robot accuracy metric 5

Appendix 2: Implementation of Synchronization Methods

Compression

The implementation begins by creating a database on the receiving end using the same schema described earlier. On the sending side, a stream table is created and subscribed to locally. A handler function is defined to be triggered whenever new data is written into the stream table. The function compresses the data using the compress function and transmits it to the receiving end using the remoteRun function. On the receiving side, the data is decompressed using decompress and then written into the target database.

Since stream table subscriptions send data in segments as new data arrives, the transmission time is also measured in segments. To capture this, the receiving end maintains a table to log the elapsed time for each segment received during subscription.

The core implementation code for this process is shown below.

Cloud Side (Receiver):

login("admin","123456")
share streamTable(1:0,`sourceDevice`startTime`endTime`cost`rowCount,[INT,TIMESTAMP,TIMESTAMP,DOUBLE,INT]) as timeCost

colNames = `device_id`error`detect_time`temp`humidity`temp_max`humidity_max`trace`position`Point_one`point_two`point_three`point_four`point_five`point_six`point_seven`point_eight`point_nine`point_ten`responsible`create_time`rob_type`rob_sn`rob_create`rob_acc1`rob_acc2`rob_acc3`rob_acc4`rob_acc5
colTypes = [SYMBOL,BOOL,TIMESTAMP,DOUBLE,DOUBLE,DOUBLE,DOUBLE,INT,SYMBOL,BOOL,BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, BOOL, SYMBOL, TIMESTAMP, INT, SYMBOL, TIMESTAMP, DOUBLE, DOUBLE, DOUBLE, DOUBLE, DOUBLE]

if(existsDatabase("dfs://Robot")){dropDatabase("dfs://Robot");}
data= table(1:0, colNames, colTypes)
db = database(directory = "dfs://Robot",partitionType = VALUE, partitionScheme = [today()],engine=`TSDB)
createPartitionedTable(dbHandle=db, table=data, tableName = `data, partitionColumns = `detect_time,sortColumns = `device_id`detect_time)

Edge Side (Sender):

def pushData(ip,port,compressMethod,msg){
        pubNodeHandler1=xdb(ip,port,`admin,123456)
        StartTime=now()
        table1=select * from msg;
        y=compress(table1,compressMethod)
        remoteRun(pubNodeHandler1,'def insertTableToDatabase(y,StartTime){
                                        
                                        pt=loadTable("dfs://Robot",`data)
                                        pt.append!(decompress(y))
                                        endTime=now()
                                        duration=endTime-StartTime
                                        tmp = table([0] as c1,[StartTime] as c2,[endTime] as c3,[duration] as c4,[y.size()] as c5)
                                        timeCost.append!(tmp)
                        }')
       
        remoteRun(pubNodeHandler1,'insertTableToDatabase',y,StartTime)
}
ip = "xx.xx.xx.xx"
port = xxxx
compressMethod = "lz4"
subscribeTable(,`data,`pushData,0,pushData{ip,port,compressMethod},msgAsTable=true,batchSize=500000,throttle=1)

To evaluate the performance of different synchronization schemes under varying data volumes, we progressively increased the amount of generated data and measured the corresponding resource usage. A summary of the results is shown below.

Data Volume (rows × columns) Elapsed Time (ms) Avg. Sync Throughput (records/s) CPU Usage (%) Memory Usage (%) Network Throughput (MB/s)
4,800 60 80,000 103% 0.6 0.04
240,000 1,883 127,456 120% 0.7 1.22
1,920,000 14,384 133,482 140% 1.1 9.5
14,400,000 108,529 132,683 140% 1.7 10.1

As the volume of transmitted data increases, network bandwidth usage rises significantly, while memory and CPU usage show a moderate increase. Network throughput approaches the maximum bandwidth limit of the test environment. To further improve synchronization speed, more system resources and higher network bandwidth are required.

The test code above uses lz4 compression by default, and all reported performance results are based on this configuration.

To switch to Zstd compression, simply set the parameter compressMethod = "zstd".

No Compression

This section mainly serves as a comparison against compressed transmission. Replace the sender-side code with the following to disable compression.

def pushDataNoCompress(ip,port,msg){
        pubNodeHandler1=xdb(ip,port,`admin,123456)
        StartTime=now()
        remoteRun(pubNodeHandler1,'def insertTableToDatabase(y,StartTime){
                                        
                                        pt=loadTable("dfs://Robot",`data)
                                        pt.append!(y)
                                        endTime=now()
                                        duration=endTime-StartTime
                                        tmp = table([0] as c1,[StartTime] as c2,[endTime] as c3,[duration] as c4,[y.size()] as c5)
                                        timeCost.append!(tmp)
                        }')
       
        remoteRun(pubNodeHandler1,'insertTableToDatabase',msg,StartTime)
}
ip = "xx.xx.xx.xx"
port = xxxx
subscribeTable(,`data,`pushData,0,pushDataNoCompress{ip,port},msgAsTable=true,batchSize=500000,throttle=1)