Skip to content

Percentile approximation overview

Estimate percentile values and percentile ranks using memory-efficient approximation algorithms

These functions are more CPU- and memory-efficient than exact calculations using PostgreSQL‘s percentile_cont and percentile_disc functions, making them ideal for large datasets and continuous aggregates.

TimescaleDB Toolkit provides two advanced percentile approximation algorithms:

  • UddSketch: Produces stable estimates within a guaranteed relative error
  • t-digest: More accurate at extreme quantiles, though somewhat dependent on input order

This group of functions uses the two-step aggregation pattern.

Rather than calculating the final result in one step, you first create an intermediate aggregate by using the aggregate function.

Then, use any of the accessors on the intermediate aggregate to calculate a final result. You can also roll up multiple intermediate aggregates with the rollup functions.

The two-step aggregation pattern has several advantages:

  1. More efficient because multiple accessors can reuse the same aggregate
  2. Easier to reason about performance, because aggregation is separate from final computation
  3. Easier to understand when calculations can be rolled up into larger intervals, especially in window functions and continuous aggregates
  4. Perform retrospective analysis even when underlying data is dropped, because the intermediate aggregate stores extra information not available in the final result

To learn more, see the blog post on two-step aggregates.

Section titled “Using percentile_agg (recommended for most use cases)”

Create an hourly continuous aggregate and calculate daily percentiles:

CREATE MATERIALIZED VIEW foo_hourly
WITH (timescaledb.continuous)
AS SELECT
time_bucket('1 h'::interval, ts) AS bucket,
percentile_agg(value) AS pct_agg
FROM foo
GROUP BY 1;
-- Query daily percentiles
SELECT
time_bucket('1 day'::interval, bucket) AS bucket,
approx_percentile(0.95, rollup(pct_agg)) AS p95,
approx_percentile(0.99, rollup(pct_agg)) AS p99
FROM foo_hourly
GROUP BY 1;

Aggregate percentile data with specific error bounds:

SELECT
time_bucket('1 day'::interval, ts) AS day,
uddsketch(200, 0.001, value) AS sketch
FROM measurements
GROUP BY day;

Calculate percentiles at extreme ends of the distribution:

CREATE MATERIALIZED VIEW response_times_hourly
WITH (timescaledb.continuous)
AS SELECT
time_bucket('1 h'::interval, ts) AS bucket,
tdigest(100, response_time) AS digest
FROM requests
GROUP BY 1;
-- Query for extreme percentiles
SELECT
bucket,
approx_percentile(0.999, digest) AS p999,
approx_percentile(0.9999, digest) AS p9999
FROM response_times_hourly;
  • uddsketch(): estimate percentiles using the UddSketch algorithm with guaranteed relative error
  • tdigest(): estimate percentiles using the t-digest algorithm, optimized for extreme quantiles