winsorize
Syntax
winsorize(X, limit, [inclusive=true], [nanPolicy='upper'])
Alias: winsorize!
Arguments
X is a vector.
limit is a scalar or a vector with 2 elements indicating the percentages to cut on each side of X, with respect to the number of unmasked data, as floats between 0 and 1. If limit is a scalar, it means the percentages to cut on both sides of X. If limit has n elements (including NULLs), the (n * limit[0])-th smallest element and the (n * limit[1])-th largest element are masked, and the total number of unmasked data after trimming is n * (1-sum(limit)). The value of one element of limit can be set to 0 to indicate no masking is conducted on this side.
inclusive (optional) is a Boolean type scalar or a vector of 2 elements indicating whether the number of data being masked on each side should be truncated (true) or rounded (false).
nanPolicy (optional) is a string indicating how to handle NULL values. The following options are available (default is 'upper'):
-
'upper': allows NULL values and treats them as the largest values of X.
-
'lower': allows NULL values and treats them as the smallest values of X.
-
'raise': throws an error.
-
'omit': performs the calculations without masking NULL values.
Details
Return a winsorized version of the input array.
Examples
x=1..10
// output
winsorize(x, 0.1);
[2,2,3,4,5,6,7,8,9,9]
winsorize(x, 0.12 0.17);
// output
[2,2,3,4,5,6,7,8,9,9]
winsorize(x, 0.12 0.17, inclusive=false);
// output
[2,2,3,4,5,6,7,8,8,8]
x=1..20;
x[19:]=NULL;
x;
// output
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,]
winsorize(x, 0.1);
// output
[3,3,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,18,18]
winsorize(x, 0.1, nanPolicy='upper');
// output
[3,3,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,18,18]
winsorize(x, 0.1, nanPolicy='lower');
// output
[2,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,17,17,2]