digitize

Syntax

digitize(x, bins, [right=false])

Arguments

x is a scalar or vector of floating-point, integral, or DECIMAL type, indicating the value to be binned.

bins is a monotonically increasing or decreasing vector of floating-point, integral, or DECIMAL type, indicating the bins.

right (optional) is a Boolean value indicating whether the intervals include the right or the left bin edge. Default behavior is right=false indicating that the interval includes the left edge.

Details

Return the indices of the bins to which each value in x belongs. The return value has the same data form as x.

right order of bins returned index i satisfies
false increasing bins[i-1] <= x < bins[i]
true increasing bins[i-1] < x <= bins[i]
false decreasing bins[i-1] > x >= bins[i]
true decreasing bins[i-1] >= x > bins[i]

If values in x are beyond the bounds of bins, 0 (for values beyond left bound) or length of bins (for values beyond right bound) is returned.

This function serves the same functionality as numpy.digitize.

Examples

When x is a scalar:

bins = [1,3,3,5,5]

// returns index i that satisfies bins[i-1] <= 3 < bins[i]
digitize(3, bins=bins, right=false)
// output: 3

//returns index i that satisfies bins[i-1] <= 5 < bins[i]. Since bins[i] > 5 does not exist, size(bins) is returned.
digitize(5, bins=bins, right=false)
//output: 5

// returns index i that satisfies bins[i-1] < 5 <= bins[i].
digitize(5, bins=bins, right=true)
//output: 3

bins = reverse(bins)
digitize(5, bins=bins, right=false)
//output: 0

digitize(5, bins=bins, right=true)
//output: 2

When x is a vector:

x = [-1,0,1,2,3,4,5,6]
bins = [1,3,5]
digitize(x=x, bins=bins, right=false)
//output: [0,0,1,1,2,2,3,3]

digitize(x=x, bins=bins, right=true)
//output: [0,0,0,1,1,2,2,3]

bins = reverse(bins)
digitize(x=x, bins=bins, right=false)
//output: [3,3,2,2,1,1,0,0]

digitize(x=x, bins=bins, right=true)
//output: [3,3,3,2,2,1,1,0]

The following example demonstrates the difference between digitize and bucket.

For function bucket, if the number of elements of the input vector that belong to dataRange ("[12, 53)" in this case) is not a multiple of bucketNum ("2" in this case), an error will be thrown. The digitize function, however, is more flexible in customizing bins.

bucket(9 23 54 36 46 12, 12:53, 2)
//throw an error: dataRange must be the mutltiplier of bucketNum.

digitize(9 23 54 36 46 12 , 12 40 53)
// output: [0,1,3,1,2,1]