UddSketch overview

Percentile approximation with guaranteed relative error using the UddSketch algorithm

Estimate the value at a given percentile, or the percentile rank of a given value, using the UddSketch algorithm. This estimation is more memory- and CPU-efficient than an exact calculation using PostgreSQL‘s percentile_cont and percentile_disc functions.

uddsketch is one of two advanced percentile approximation aggregates provided in TimescaleDB Toolkit. It produces stable estimates within a guaranteed relative error.

The other advanced percentile approximation aggregate is tdigest, which is more accurate at extreme quantiles, but is somewhat dependent on input order.

If you aren’t sure which aggregate to use, try the default percentile estimation method, percentile_agg. It uses the uddsketch algorithm with some sensible defaults.

For more information about percentile approximation algorithms, see the algorithms overview.

Two-step aggregation

This group of functions uses the two-step aggregation pattern.

Rather than calculating the final result in one step, you first create an intermediate aggregate by using the aggregate function.

Then, use any of the accessors on the intermediate aggregate to calculate a final result. You can also roll up multiple intermediate aggregates with the rollup functions.

The two-step aggregation pattern has several advantages:

More efficient because multiple accessors can reuse the same aggregate
Easier to reason about performance, because aggregation is separate from final computation
Easier to understand when calculations can be rolled up into larger intervals, especially in window functions and continuous aggregates
Perform retrospective analysis even when underlying data is dropped, because the intermediate aggregate stores extra information not available in the final result

To learn more, see the blog post on two-step aggregates.

Samples

Aggregate and roll up percentile data using percentile_agg

Create an hourly continuous aggregate that contains a percentile aggregate:

CREATE MATERIALIZED VIEW foo_hourly
WITH (timescaledb.continuous)
AS SELECT
    time_bucket('1 h'::interval, ts) AS bucket,
    percentile_agg(value) AS pct_agg
FROM foo
GROUP BY 1;

Use accessors to query directly from the continuous aggregate for hourly data. You can also roll the hourly data up into daily buckets, then calculate approximate percentiles:

SELECT
    time_bucket('1 day'::interval, bucket) AS bucket,
    approx_percentile(0.95, rollup(pct_agg)) AS p95,
    approx_percentile(0.99, rollup(pct_agg)) AS p99
FROM foo_hourly
GROUP BY 1;

Aggregate and roll up percentile data using uddsketch

Create an hourly continuous aggregate that contains a percentile aggregate:

CREATE MATERIALIZED VIEW foo_hourly
WITH (timescaledb.continuous)
AS SELECT
    time_bucket('1 h'::interval, ts) AS bucket,
    uddsketch(100, 0.01, value) AS uddsketch
FROM foo
GROUP BY 1;

Use accessors to query directly from the continuous aggregate for hourly data. You can also roll the hourly data up into daily buckets, then calculate approximate percentiles:

SELECT
    time_bucket('1 day'::interval, bucket) AS bucket,
    approx_percentile(0.95, rollup(uddsketch)) AS p95,
    approx_percentile(0.99, rollup(uddsketch)) AS p99
FROM foo_hourly
GROUP BY 1;

Available functions

Aggregate

uddsketch(): aggregate data in a uddsketch for percentile calculation

Alternate aggregate

percentile_agg(): aggregate data using sensible defaults for percentile calculation

Accessors

approx_percentile(): estimate the value at a given percentile from a uddsketch
approx_percentile_array(): estimate values at multiple percentiles from a uddsketch
approx_percentile_rank(): estimate the percentile rank of a given value from a uddsketch
error(): get the maximum relative error of a uddsketch
mean(): calculate the exact mean from values in a uddsketch
num_vals(): get the number of values in a uddsketch

Rollup

rollup(): combine multiple uddsketch aggregates