Percentile approximation overview
Estimate percentile values and percentile ranks using memory-efficient approximation algorithms
These functions are more CPU- and memory-efficient than exact calculations using PostgreSQL‘s percentile_cont and
percentile_disc functions, making them ideal for large datasets and continuous aggregates.
TimescaleDB Toolkit provides two advanced percentile approximation algorithms:
- UddSketch: Produces stable estimates within a guaranteed relative error
- t-digest: More accurate at extreme quantiles, though somewhat dependent on input order
Two-step aggregation
Section titled “Two-step aggregation”This group of functions uses the two-step aggregation pattern.
Rather than calculating the final result in one step, you first create an intermediate aggregate by using the aggregate function.
Then, use any of the accessors on the intermediate aggregate to calculate a final result. You can also roll up multiple intermediate aggregates with the rollup functions.
The two-step aggregation pattern has several advantages:
- More efficient because multiple accessors can reuse the same aggregate
- Easier to reason about performance, because aggregation is separate from final computation
- Easier to understand when calculations can be rolled up into larger intervals, especially in window functions and continuous aggregates
- Perform retrospective analysis even when underlying data is dropped, because the intermediate aggregate stores extra information not available in the final result
To learn more, see the blog post on two-step aggregates.
Samples
Section titled “Samples”Using percentile_agg (recommended for most use cases)
Section titled “Using percentile_agg (recommended for most use cases)”Create an hourly continuous aggregate and calculate daily percentiles:
CREATE MATERIALIZED VIEW foo_hourlyWITH (timescaledb.continuous)AS SELECT time_bucket('1 h'::interval, ts) AS bucket, percentile_agg(value) AS pct_aggFROM fooGROUP BY 1;
-- Query daily percentilesSELECT time_bucket('1 day'::interval, bucket) AS bucket, approx_percentile(0.95, rollup(pct_agg)) AS p95, approx_percentile(0.99, rollup(pct_agg)) AS p99FROM foo_hourlyGROUP BY 1;Using uddsketch for custom error control
Section titled “Using uddsketch for custom error control”Aggregate percentile data with specific error bounds:
SELECT time_bucket('1 day'::interval, ts) AS day, uddsketch(200, 0.001, value) AS sketchFROM measurementsGROUP BY day;Using tdigest for extreme quantiles
Section titled “Using tdigest for extreme quantiles”Calculate percentiles at extreme ends of the distribution:
CREATE MATERIALIZED VIEW response_times_hourlyWITH (timescaledb.continuous)AS SELECT time_bucket('1 h'::interval, ts) AS bucket, tdigest(100, response_time) AS digestFROM requestsGROUP BY 1;
-- Query for extreme percentilesSELECT bucket, approx_percentile(0.999, digest) AS p999, approx_percentile(0.9999, digest) AS p9999FROM response_times_hourly;Available functions
Section titled “Available functions”UddSketch (recommended)
Section titled “UddSketch (recommended)”uddsketch(): estimate percentiles using the UddSketch algorithm with guaranteed relative error
t-digest (for extreme quantiles)
Section titled “t-digest (for extreme quantiles)”tdigest(): estimate percentiles using the t-digest algorithm, optimized for extreme quantiles