Percentile approximation overview

Estimate percentile values and percentile ranks using memory-efficient approximation algorithms

These functions are more CPU- and memory-efficient than exact calculations using PostgreSQL‘s percentile_cont and percentile_disc functions, making them ideal for large datasets and continuous aggregates.

TimescaleDB Toolkit provides two advanced percentile approximation algorithms:

UddSketch: Produces stable estimates within a guaranteed relative error
t-digest: More accurate at extreme quantiles, though somewhat dependent on input order

Two-step aggregation

This group of functions uses the two-step aggregation pattern.

Rather than calculating the final result in one step, you first create an intermediate aggregate by using the aggregate function.

Then, use any of the accessors on the intermediate aggregate to calculate a final result. You can also roll up multiple intermediate aggregates with the rollup functions.

The two-step aggregation pattern has several advantages:

More efficient because multiple accessors can reuse the same aggregate
Easier to reason about performance, because aggregation is separate from final computation
Easier to understand when calculations can be rolled up into larger intervals, especially in window functions and continuous aggregates
Perform retrospective analysis even when underlying data is dropped, because the intermediate aggregate stores extra information not available in the final result

To learn more, see the blog post on two-step aggregates.

Samples

Using percentile_agg (recommended for most use cases)

Create an hourly continuous aggregate and calculate daily percentiles:

CREATE MATERIALIZED VIEW foo_hourly
WITH (timescaledb.continuous)
AS SELECT
    time_bucket('1 h'::interval, ts) AS bucket,
    percentile_agg(value) AS pct_agg
FROM foo
GROUP BY 1;

-- Query daily percentiles
SELECT
    time_bucket('1 day'::interval, bucket) AS bucket,
    approx_percentile(0.95, rollup(pct_agg)) AS p95,
    approx_percentile(0.99, rollup(pct_agg)) AS p99
FROM foo_hourly
GROUP BY 1;

Using uddsketch for custom error control

Aggregate percentile data with specific error bounds:

SELECT
    time_bucket('1 day'::interval, ts) AS day,
    uddsketch(200, 0.001, value) AS sketch
FROM measurements
GROUP BY day;

Using tdigest for extreme quantiles

Calculate percentiles at extreme ends of the distribution:

CREATE MATERIALIZED VIEW response_times_hourly
WITH (timescaledb.continuous)
AS SELECT
    time_bucket('1 h'::interval, ts) AS bucket,
    tdigest(100, response_time) AS digest
FROM requests
GROUP BY 1;

-- Query for extreme percentiles
SELECT
    bucket,
    approx_percentile(0.999, digest) AS p999,
    approx_percentile(0.9999, digest) AS p9999
FROM response_times_hourly;

Available functions

UddSketch (recommended)

uddsketch(): estimate percentiles using the UddSketch algorithm with guaranteed relative error

t-digest (for extreme quantiles)

tdigest(): estimate percentiles using the t-digest algorithm, optimized for extreme quantiles