Frequency analysis overview

Functions for analyzing the frequency of values in time-series data

Analyze the frequency of values in time-series data using memory-efficient probabilistic data structures. These functions help you identify the most common elements and estimate occurrence counts without storing every individual value.

TimescaleDB Toolkit provides two approaches to frequency analysis:

freq_agg: Get the most common elements and their relative frequency using the SpaceSaving algorithm
count_min_sketch: Estimate the absolute number of times a specific value appears using the count-min sketch data structure

Two-step aggregation

This group of functions uses the two-step aggregation pattern.

Rather than calculating the final result in one step, you first create an intermediate aggregate by using the aggregate function.

Then, use any of the accessors on the intermediate aggregate to calculate a final result. You can also roll up multiple intermediate aggregates with the rollup functions.

The two-step aggregation pattern has several advantages:

More efficient because multiple accessors can reuse the same aggregate
Easier to reason about performance, because aggregation is separate from final computation
Easier to understand when calculations can be rolled up into larger intervals, especially in window functions and continuous aggregates
Perform retrospective analysis even when underlying data is dropped, because the intermediate aggregate stores extra information not available in the final result

To learn more, see the blog post on two-step aggregates.

Samples

Find the most common values

Get the 5 most common values from a dataset:

CREATE TABLE value_test(value INTEGER);
INSERT INTO value_test SELECT floor(sqrt(random() * 400)) FROM generate_series(1,100000);

SELECT topn(
    toolkit_experimental.freq_agg(0.05, value),
    5)
FROM value_test;

Get frequency information for common values

Return values that represent more than 5% of the input, along with their frequency bounds:

SELECT value, min_freq, max_freq
FROM into_values(
    (SELECT toolkit_experimental.freq_agg(0.05, value) FROM value_test));

Estimate absolute counts

Use count-min sketch to estimate how many times specific values appear:

WITH sketch AS (
    SELECT toolkit_experimental.count_min_sketch(user_id::text, 0.01, 0.01) AS cms
    FROM user_events
)
SELECT toolkit_experimental.approx_count(cms, 'user123') AS estimated_count
FROM sketch;

Available functions

Frequency aggregation

freq_agg(): track the most common values using a minimum frequency cutoff

Count-min sketch

count_min_sketch(): estimate absolute counts using the count-min sketch data structure