sample
Syntax
sample(partitionCol, size)
Arguments
partitionCol is a partitioning column.
size is a positive floating number or integer.
Details
Must be used in a where
clause. Take a random sample of a number of
partitions in a partitioned table.
Suppose the database has N partitions. If 0<size<1, then take int(N*size) partitions. If size is a positive integer, then take size partitions.
Examples
n=1000000
ID=rand(50, n)
x=rand(1.0, n)
t=table(ID, x)
db=database("dfs://rangedb1", RANGE, $ 0 10 20 30 40 50)
pt = db.createPartitionedTable(t, `pt, `ID)
pt.append!(t)
pt=loadTable(db,`pt);
Table pt has 5 partitions. To take a random sample of 2 partitions, we can use either of the following queries:
x = select * from pt where sample(ID, 0.4);
x = select * from pt where sample(ID, 2);