qcut

First introduced in version: 3.00.5

Syntax

qcut(X, q, [labels],[dropDuplicates=false])

Details

Determines the quantile bin for each element based on its rank in a numeric vector. For example, given 1,000 values, divides them into 10 quantile bins and returns the bin label each element belongs to.

Although the qcut functions in DolphinDB and pandas both perform quantile discretization, the core differences are: DolphinDB operates directly on table columns and returns an integer index vector by default, making it suitable for efficient binning of massive datasets; pandas is an in-memory analysis tool that returns a Categorical object that preserves interval information and supports retbins to extract bin boundaries and precision to control numeric precision, making it more suitable for interactive analysis.

Parameters

X: A numeric vector.

q: An INT scalar or a FLOATING vector.

  • An INT scalar specifies the number of quantile bins (e.g., 10 for deciles, 4 for quartiles).
  • A FLOATING vector specifies the quantile breakpoints. It must contain at least two elements, with values in the range [0, 1].

labels (optional): A vector of labels for each quantile bin.

  • It defaults to NULL, which means the function returns an integer vector representing the bin index for each element.
  • If q is a scalar, the length of labels must equal q.
  • If q is a vector, the length of labels must be len(q) - 1.

dropDuplicates: A boolean value specifying whether to drop duplicate bin boundaries.

  • It defaults to false, which means raising an error if duplicate boundaries exist.
  • If it is set to true, duplicate boundaries are removed.

Returns

A vector indicating the quantile bin to which each element belongs.

Examples

// Divide the data into 4 quantile bins
qcut([1,2,3,4,5,6,7,8,9,10], 4)
// Output: [0 0 0 1 1 2 2 3 3 3]

// Divide using custom quantile breakpoints: 0–30%, 30–70%, 70–100%
qcut([1,2,3,4,5,6,7,8,9,10], [0, 0.3, 0.7, 1.0])
// Output: [0 0 0 1 1 1 1 2 2 2]

// Divide the data into 4 quantile bins and use custom labels
qcut([1,2,3,4,5,6,7,8,9,10], 4, ["Q1", "Q2", "Q3", "Q4"])
// Output: [Q1 Q1 Q1 Q2 Q2 Q3 Q3 Q4 Q4 Q4]

/* Due to a large number of duplicate values in the data,
   the quantile boundaries are not unique.
   After enabling dropDuplicates, duplicate boundaries are automatically removed,
   resulting in fewer than 4 quantile bins.
*/
qcut(X=[1, 1, 1, 1, 2, 3], q=4, dropDuplicates=true)
// Output: [0 0 0 0 2 2]