AWSS3

Amazon S3 is a cloud storage service for the storage and retrieval of large amounts of data. With the DolphinDB AWSS3 plugin, users can interact with the Amazon S3 service to back up their data to the cloud or download data from the cloud.

The plugin relies on the following third-party libraries:

  • libaws-cpp-sdk-core.so
  • libaws-cpp-sdk-s3.so
  • libcurl.so

Installation (with installPlugin)

Required server version: DolphinDB 2.00.10 or higher

Supported OS: Windows x64 and Linux x64.

Installation Steps:

(1) Use listRemotePlugins to check plugin information in the plugin repository.

Note: For plugins not included in the provided list, you can install through precompiled binaries or compile from source. These files can be accessed from our GitHub repository by switching to the appropriate version branch.

login("admin", "123456")
listRemotePlugins("awss3")

(2) Invoke installPlugin for plugin installation.

installPlugin("awss3")

(3) Use loadPlugin to load the plugin before using the plugin methods.

loadPlugin("awss3")

Method References

listS3Object

Syntax

listS3Object(s3account, bucket, prefix, [marker],[delimiter], [nextMarker], [MaxKeys])

Details

Return a DolphinDB table listing the attributes of all objects under the given bucket.

The attributes listed are as follows:

  • index: The index number of the object.
  • bucket name: The name of the bucket.
  • key name: The name of the object.
  • last modified: The last modified time with the format ISO_8601.
  • length: The size in bytes of the object.
  • ETag: The entity tag of the object.
  • owner: The owner of the object.

Parameters

  • s3account: A dictionary with STRING type keys, which stores account info including "id" (access key id), "key" (secret access key), "region" (your AWS S3 region), and "endpoint" (URL for accessing AWS resources).

    • Connecting to a public cloud requires "id", "key", and "region".

      account=dict(string,string);
      account['id']=your_access_key_id;
      account['key']=your_secret_access_key;
      account['region']=your_region;
    • Connecting to a private cloud requires "id", "key", "endpoint", and "isHttp":

      account=dict(STRING,ANY)
      account['id']="minioadmin";
      account['key']="minioadmin"
      account['endpoint'] = "127.0.0.1:9000";       //Note that "endpoint" cannot contain "http://" or "https://"
      account['isHttp'] = true;
    • Note: If validation fails or SSL errors occur, it is recommended to specify the certificate:

      account['caPath']=your_ca_file_path;     //e.g. '/etc/ssl/certs'
      account['caFile']=your_ca_file;          //e.g. 'ca-certificates.crt'
      account['verifySSL']=verify_or_not;      //e.g. false
  • bucket: A STRING scalar indicating the name of the bucket to access.

  • prefix: A STRING scalar indicating the prefix of the access path, which can be an empty string.

  • marker (optional): A STRING scalar indicating the key to start listing from.

  • delimiter (optional): A STRING scalar indicating the character used to group keys.

  • nextMarker (optional, output): A STRING scalar indicating the marker to get the next set of objects when the number of returned keys exceeds MaxKeys.

  • MaxKeys (optional): A LONG scalar indicating the maximum number of returned keys. The default value is 1000.

getS3Object

Syntax

getS3Object(s3account, bucket, key, [outputFileName])

Details

Get a specified S3 object. Return the file name of the local object.

Parameters

  • s3account: A dictionary with STRING type keys, which stores account info.
  • bucket: A STRING scalar indicating the name of the bucket to access.
  • key: A STRING scalar indicating the name of the object to get.
  • outputFileName (optional): A STRING scalar indicating the file name of the output object. The default value is key.

readS3Object

Syntax

readS3Object(s3account, bucket, key, offset, length)

Details

Get part of a specified S3 object. Return a char vector.

Parameters

  • s3account: A dictionary with STRING type keys, which stores account info.
  • bucket: A STRING scalar indicating the name of the bucket to access.
  • key: A STRING scalar indicating the name of the object to get.
  • offset: A LONG scalar indicating the starting position (in bytes) of the object to get.
  • length: A LONG scalar indicating the length (in bytes) of the object to get.

deleteS3Object

Syntax

deleteS3Object(s3account, bucket, key)

Details

Delete a specified S3 object (warning: the deletion cannot be undone).

Parameters

  • s3account: A dictionary with STRING type keys, which stores account info.
  • bucket: A STRING scalar indicating the name of the bucket to access.
  • key: A STRING scalar indicating the name of the object to get.

uploadS3Object

Syntax

uploadS3Object(s3account, bucket, key, inputFileName)

Details

Upload an object to S3.

Parameters

  • s3account: A dictionary with STRING type keys, which stores account info.
  • bucket: A STRING scalar indicating the name of the bucket to access.
  • key: A STRING scalar indicating the name of the object to get.
  • inputFileName: A STRING scalar indicating the name and path of the object to upload.

listS3Bucket

Syntax

listS3Bucket(s3account)

Details

Return a table that lists all buckets and their creation dates under the given s3account. The format of the date is ISO_8601.

Parameters

  • s3account: A dictionary with STRING type keys, which stores account info.

deleteS3Bucket

Syntax

deleteS3Bucket(s3account, bucket)

Details

Delete a given bucket (warning: the deletion cannot be undone).

Parameters

  • s3account: A dictionary with STRING type keys, which stores account info.
  • bucket: A STRING scalar indicating the name of the bucket to access.

createS3Bucket

Syntax

createS3Bucket(s3account, bucket)

Details

Create a bucket.

Parameters

  • s3account: A dictionary with STRING type keys, which stores account info.
  • bucket: A STRING scalar indicating the name of the bucket to access.

loadS3Object

Syntax

loadS3Object(s3account, bucket, key, threadCount, dbHandle, tableName, partitionColumns, [delimiter],[schema], [skipRows], [transform], [sortColumns], [atomic], [arrayDelimiter])

Details

Load a batch of S3 objects to a table. Return a table with 3 columns: object (STRING), errorCode (INT), and errorInfo (STRING).

The error codes are explained as follows:

  • 0-No error.
  • 1-Unknown issue.
  • 2-Failed to parse the file and write it to the table.
  • 3-Failed to download the file.
  • 4-Failed to unzip the file.
  • 5-Failed to find the unzipped file.
  • 6-An exception is raised with the specific error message.
  • 7-An unknown exception is raised with no specific error message.

Parameters

  • s3account: A dictionary with STRING type keys, which stores account info.
  • bucket: A STRING scalar indicating the name of the bucket to access.
  • key: A STRING scalar indicating the name of the object to get.
  • threadCount: A positive integer indicating the number of threads that can be used to load the objects.
  • dbHandle: The handle of the database where the imported data will be saved. It can be either a DFS database or an in-memory database.
  • tableName: A STRING scalar indicating the name of the table with the imported data.
  • partitionColumns: A STRING scalar/vector indicating the partitioning column(s). For sequential partition, leave it unspecified; For composite partition, partitionColumns is a string vector.
  • delimiter (optional): A STRING scalar indicating the table column separator. The default value is ",".
  • schema (optional): A table specifying the schema of the object. It can have the following 4 columns, among which "name" and "type" are required.
ColumnDescription
nameA STRING scalar indicating the column name.
typeA STRING scalar indicating the data type of each column. Currently, BLOB, COMPLEX, POINT, and DURATION are not supported.
formatA STRING scalar indicating the format of temporal columns.
colAn INT scalar indicating the indices of the columns to be loaded. The values must be in ascending order.
  • skipRows (optional): An integer between 0 and 1024, indicating the number of rows to skip from the beginning of the file. The default value is 0.
  • transform (optional): A unary function that accepts a table as the parameter. The function is performed on the data in the file by the AWSS3 plugin and the results are stored in the database.
  • sortColumns (optional): A STRING scalar/vector indicating the columns based on which the table is sorted. The data of the same sort column will be stored in order within the partition.
  • atomic (optional): A BOOLEAN scalar indicating whether to guarantee atomicity when loading a file with the cache engine enabled. If it is set to true, the entire loading process of a file is a transaction; set to false to split the loading process into multiple transactions.

Note: It is required to set atomic = false if the file to be loaded exceeds the cache engine capacity. Otherwise, a transaction may get stuck: it can neither be committed nor rolled back.

  • arrayDelimiter (optional): A STRING scalar indicating the delimiter for columns holding the array vectors in the file. Since the array vectors cannot be recognized automatically, you must use the schema parameter to update the data type of the type column with the corresponding array vector data type before import. The default value is ",".

headS3Object

Syntax

headS3Object(s3account, bucket, key)

Details

Get the metadata of a specified file. Return a dictionary containing "bucket name", "key name", "length", "last modified", "ETag", and "content type".

Parameters

  • s3account: A dictionary with STRING type keys, which stores account info.
  • bucket: A STRING scalar indicating the name of the bucket to access.
  • key: A STRING scalar indicating the name of the object to get.

copyS3Object

Syntax

copyS3Object(s3account, bucket, srcPath, destPath)

Details

Copy an existing S3 object to another destination in the same bucket.

Parameters

  • s3account: A dictionary with STRING type keys, which stores account info.
  • bucket: A STRING scalar indicating the name of the bucket to access.
  • srcPath: A STRING vector indicating the path of the source file.
  • destPath: A STRING vector indicating the path of the target file.

Usage Examples

account=dict(string,string);
account['id']=your_access_key_id;
account['key']=your_secret_access_key;
account['region']=your_region;
db = database(directory="dfs://rangedb", partitionType=RANGE, partitionScheme=0 51 101)
awss3::loadS3Object(account, 'dolphindb-test-bucket', 't2.zip', 4, db, `pt, `ID);