digitize
Syntax
digitize(x, bins, [right=false])
Arguments
x is a scalar or vector of floating-point, integral, or DECIMAL type, indicating the value to be binned.
bins is a monotonically increasing or decreasing vector of floating-point, integral, or DECIMAL type, indicating the bins.
right (optional) is a Boolean value indicating whether the intervals include the right or the left bin edge. Default behavior is right=false indicating that the interval includes the left edge.
Details
Return the indices of the bins to which each value in x belongs. The return value has the same data form as x.
right | order of bins | returned index i satisfies |
---|---|---|
false | increasing | bins[i-1] <= x < bins[i] |
true | increasing | bins[i-1] < x <= bins[i] |
false | decreasing | bins[i-1] > x >= bins[i] |
true | decreasing | bins[i-1] >= x > bins[i] |
If values in x are beyond the bounds of bins, 0 (for values beyond left bound) or length of bins (for values beyond right bound) is returned.
This function serves the same functionality as numpy.digitize.
Examples
When x is a scalar:
bins = [1,3,3,5,5]
// returns index i that satisfies bins[i-1] <= 3 < bins[i]
digitize(3, bins=bins, right=false)
// output: 3
//returns index i that satisfies bins[i-1] <= 5 < bins[i]. Since bins[i] > 5 does not exist, size(bins) is returned.
digitize(5, bins=bins, right=false)
//output: 5
// returns index i that satisfies bins[i-1] < 5 <= bins[i].
digitize(5, bins=bins, right=true)
//output: 3
bins = reverse(bins)
digitize(5, bins=bins, right=false)
//output: 0
digitize(5, bins=bins, right=true)
//output: 2
When x is a vector:
x = [-1,0,1,2,3,4,5,6]
bins = [1,3,5]
digitize(x=x, bins=bins, right=false)
//output: [0,0,1,1,2,2,3,3]
digitize(x=x, bins=bins, right=true)
//output: [0,0,0,1,1,2,2,3]
bins = reverse(bins)
digitize(x=x, bins=bins, right=false)
//output: [3,3,2,2,1,1,0,0]
digitize(x=x, bins=bins, right=true)
//output: [3,3,3,2,2,1,1,0]
The following example demonstrates the difference between digitize
and bucket
.
For function bucket
, if the number of elements of the input vector
that belong to dataRange ("[12, 53)" in this case) is not a multiple of
bucketNum ("2" in this case), an error will be thrown. The
digitize
function, however, is more flexible in customizing
bins.
bucket(9 23 54 36 46 12, 12:53, 2)
//throw an error: dataRange must be the mutltiplier of bucketNum.
digitize(9 23 54 36 46 12 , 12 40 53)
// output: [0,1,3,1,2,1]