# summary {#summary}

## Syntax {#section_tqq_vv2_czb .section}

`summary(X,[interpolation],[characteristic],[percentile],[precision],[partitionSampling])`

## Details {#section_otd_wv2_czb .section}

`summary` generates summary statistics for the input data. It returns an in-memory table containing the minimum, maximum, count, mean, standard deviation, and specified percentiles in ascending order.

-   If *X* is a table, `summary` only computes statistics for the numeric columns.

-   If *X* is a data source, it can only contain numeric columns, otherwise an error will occur during computation.


## Arguments {#section_ulw_xv2_czb .section}

**X** can be an in-memory table, DFS table or data source generated from [sqlDS](sqlDS.md). Note that data sources with SQL metacode containing table joins are currently not supported.

**interpolation** \(optional\) is a string indicating the interpolation method for percentiles. It can be "linear" \(default\), "nearest", "lower", "higher" and "midpoint".

**characteristic** \(optional\) is a string scalar or vector indicating the characteristics to compute. It can be "avg" and/or "std". Default is both characteristics.

**percentile** \(optional\) is a DOUBLE vector of percentiles to compute. Each vector element falls between 0 ang 100. The default is \[25,50,75\], which returns the 25th, 50th, and 75th percentiles.

**precision** \(optional\) is a DOUBLE scalar greater than 0. The default value is 1e-3, which means the iteration for computing statistics will stop when the difference between the current and previous result is less than or equal to 1e-3. It is recommended to set precision between \[1e-3, 1e-9\] - small enough for adequate precision but not too small to impact performance through excessive iterations.

**partitionSampling** \(optional\) can be a positive integer specifying the number of partitions to sample, or a float between \(0, 1\] specifying the sampling ratio. If not specified, statistics are computed on all partitions. When specifying *partitionSampling*, note:

-   For a partitioned table:

    -   at least one partition will always be sampled. If the sampling ratio \* total partitions &lt; 1, one partition is sampled.

    -   The sampling ratio is rounded down if the sampling ratio \* total partitions is not an integer. E.g. if ratio=0.26 and total partitions is 10, 2 partitions are sampled.

    -   If *partitionSampling* \(integer\) &gt; total partitions, all partitions are used.

-   *partitionSampling* has no effect for non-partitioned tables.


## Examples {#section_ptc_cw2_czb .section}

``` {#codeblock_zc5_cw2_czb}
n=2022
data=1..n
value=take(1..3,n)
name=take(`APPLE`IBM`INTEL,n)
t=table(data,value,name);
summary(t, precision=0.001); 
// name is not a numeric column and therefore will not be output

```

|name|min|max|nonNullCount|count|avg|std|percentile|
|----|---|---|------------|-----|---|---|----------|
|data|1|2,022|2,022|2,022|1,011.5|583.8454|\[506.24,1011.50,1516.75\]|
|value|1|3|2,022|2,022|2|0.8167|\[1.00,1.99,2.99\]|

``` {#codeblock_bd5_cw2_czb}
n = 5000
data1 = take(1..5000000, n)
data2 = rand(10000000, n)
data3 = take("A" + string(0..10), n)

t = table(data1, data2, data3)
dbname = "dfs://summary"
if(existsDatabase(dbname)) {
     dropDatabase(dbname)
 }
db = database(dbname, HASH, [INT, 10])
pt = createPartitionedTable(db, t, `pt, `data1)
pt.append!(t)

ds = sqlDS(<select data1,data2 from loadTable(db, `pt)>)
query_percentile = [25,50,75,90]

ds_re1 = summary(ds);
//returns the 25th, 50th, 75th and 90th percentiles
ds_re2 = summary(ds, percentile=query_percentile, precision=0.0001);
// the partition sampling ratio is 0.6. As there are 10 partitions in total,  6 partitions will be sampled for statistics computation
ds_re3 = summary(loadTable(db, `pt), percentile=query_percentile, precision=0.0001, partitionSampling=0.6);
```

**Parent topic:**[Functions](../../Functions/category.md)

