repartitionDS
Syntax
repartitionDS(query, [column], [partitionType], [partitionScheme],
[local=true])
Arguments
query is metacode of SQL statements or a tuple of metacode of SQL statements.
column (optional) is a string indicating a column name in query. Function
repartitionDS
deliminates data sources based on column.
partitionType (optional) means the type of partition. It can take the value of VALUE or RANGE.
partitionScheme (optional) is a vector indicating the partitioning scheme. For details please refer to DistributedComputing.
local(optional) is a Boolean value indicating whether to fetch the data
sources to the local node for computing. The default value is true. When set to
false, if repartitionDS
is called on a compute node within a
compute group, data sources are fetched to all compute nodes within the group;
Otherwise data sources are fetched to all data nodes and compute nodes which are not
included in any compute groups.
Details
Repartition a table with specified partitioning type and scheme, and return a tuple of data sources.
If query is metacode of SQL statements, the parameter column must be specified. For a partitioned table with a COMPO domain, partitionType and partitionScheme can be unspecified. In this case, the data sources will be determined based on the original partitionType and partitionScheme of column.
If query is a tuple of metacode of SQL statements, the following 3 parameters should be unspecified. The function returns a tuple with the same length as query. Each element of the result is a data source corresponding to a piece of metacode in query.
Examples
n=1000000
ID=rand(100, n)
dates=2017.08.07..2017.08.11
date=rand(dates, n)
x=rand(10.0, n)
t=table(ID, date, x)
dbDate = database(, VALUE, 2017.08.07..2017.08.11)
dbID = database(, RANGE, 0 50 100)
db = database("dfs://compoDB", COMPO, [dbDate, dbID])
pt = db.createPartitionedTable(t, `pt, `date`ID)
pt.append!(t);
Example 1. query is metacode of SQL statements. partitionType and partitionScheme are specified.
repartitionDS(<select * from pt>,`date,RANGE,2017.08.07 2017.08.09 2017.08.11);
// output
[DataSource< select [4] * from pt where date >= 2017.08.07,date < 2017.08.09 >,DataSource< select [4] * from pt where date >= 2017.08.09,date < 2017.08.11 >]
Example 2. query is metacode of SQL statements. partitionType and partitionScheme are unspecified.
repartitionDS(<select * from pt>,`ID);
// output
[DataSource< select [4] * from pt [partition = */0_50] >,DataSource< select [4] * from pt [partition = */50_100] >]
Example 3. query is a tuple of metacode of SQL statements.
repartitionDS([<select * from pt where id between 0:50>,<select * from pt where id between 51:100>]);
// output
[DataSource< select [4] * from pt where id between 0 : 50 >,DataSource< select [4] * from pt where id between 51 : 100 >]