Create a continuous aggregate
Create a continuous aggregate and make sure you always have the latest aggregated data for your analytical queries
A continuous aggregate stores precomputed rollups so analytical queries avoid rescanning all raw rows. You create the materialized view first, then wire up refresh (policy or manual). You can build on a hypertable or stack another continuous aggregate, and you can define more than one continuous aggregate per source.
Continuous aggregates require a time_bucket on the time partitioning column of
the hypertable.
By default, views are automatically refreshed. You can adjust this by setting the WITH NO DATA option. Additionally, the view can not be a security barrier view.
Continuous aggregates use hypertables in the background, which means that they also use chunk time intervals. By default, the continuous aggregate‘s chunk time interval is 10 times what the original hypertable‘s chunk time interval is. For example, if the original hypertable‘s chunk time interval is 7 days, the continuous aggregates that are on top of it have a 70 day chunk time interval.
Create a continuous aggregate
Section titled “Create a continuous aggregate”In this example, we are using a hypertable called conditions, and creating a
continuous aggregate view for daily weather data. The GROUP BY clause must
include a time_bucket expression which uses time dimension column of the
hypertable. Additionally, all functions and their arguments included in
SELECT, GROUP BY, and HAVING clauses must be
immutable.
- Create the materialized viewCREATE MATERIALIZED VIEW conditions_summary_dailyWITH (timescaledb.continuous) ASSELECT device,time_bucket(INTERVAL '1 day', time) AS bucket,AVG(temperature),MAX(temperature),MIN(temperature)FROM conditionsGROUP BY device, bucket;
To create a continuous aggregate within a transaction block, use the WITH NO DATA option.
To improve continuous aggregate performance, set
timescaledb.invalidate_using = 'wal'Since 2.22.0 . - Create a refresh policySELECT add_continuous_aggregate_policy('conditions_summary_daily',start_offset => INTERVAL '1 month',end_offset => INTERVAL '1 day',schedule_interval => INTERVAL '1 hour');
You can use most PostgreSQL aggregate functions in continuous aggregations. To see what PostgreSQL features are supported, check the function support table.
Choosing an appropriate bucket interval
Section titled “Choosing an appropriate bucket interval”Continuous aggregates require a time_bucket on the time partitioning column of
the hypertable. The time bucket allows you to define a time interval, instead of
having to use specific timestamps. For example, you can define a time bucket as
five minutes, or one day.
You can’t use time_bucket_gapfill directly in a
continuous aggregate. This is because you need access to previous data to
determine the gapfill content, which isn’t yet available when you create the
continuous aggregate. You can work around this by creating the continuous aggregate
using time_bucket, then querying the continuous aggregate
using time_bucket_gapfill.
Using the WITH NO DATA option
Section titled “Using the WITH NO DATA option”By default, when you create a view for the first time, it is populated with
data. This is so that the aggregates can be computed across the entire
hypertable. If you don’t want this to happen, for example if the table is very
large, or if new data is being continuously added, you can control the order in
which the data is refreshed. You can do this by adding a manual refresh with
your continuous aggregate policy using the WITH NO DATA option.
The WITH NO DATA option allows the continuous aggregate to be created
instantly, so you don’t have to wait for the data to be aggregated. Data begins
to populate only when the policy begins to run. This means that only data newer
than the start_offset time begins to populate the continuous aggregate. If you
have historical data that is older than the start_offset interval, you need to
manually refresh the history up to the current start_offset to allow real-time aggregate
queries to run efficiently.
- Create the viewCREATE MATERIALIZED VIEW cagg_rides_viewWITH (timescaledb.continuous) ASSELECT vendor_id,time_bucket('1h', pickup_datetime) AS hour,count(*) total_rides,avg(fare_amount) avg_fare,max(trip_distance) as max_trip_distance,min(trip_distance) as min_trip_distanceFROM ridesGROUP BY vendor_id, time_bucket('1h', pickup_datetime)WITH NO DATA;
- Manually refresh the viewCALL refresh_continuous_aggregate('cagg_rides_view', NULL, localtimestamp - INTERVAL '1 week');
- Add the policySELECT add_continuous_aggregate_policy('cagg_rides_view',start_offset => INTERVAL '1 week',end_offset => INTERVAL '1 hour',schedule_interval => INTERVAL '30 minutes');
Create a continuous aggregate with a JOIN
Section titled “Create a continuous aggregate with a JOIN”In TimescaleDB v2.10 and later, with PostgreSQL v12 or later, you can
create a continuous aggregate with a query that also includes a JOIN. For
example:
CREATE MATERIALIZED VIEW conditions_summary_daily_3WITH (timescaledb.continuous) ASSELECT time_bucket(INTERVAL '1 day', day) AS bucket, AVG(temperature), MAX(temperature), MIN(temperature), nameFROM devices JOIN conditions USING (device_id)GROUP BY name, bucket;For more information about creating a continuous aggregate with a JOIN,
including some additional restrictions, see the
about continuous aggregates section.
Query continuous aggregates
Section titled “Query continuous aggregates”When you have created a continuous aggregate and set a refresh policy, you can
query the view with a SELECT query. You can only specify a single hypertable
in the FROM clause. Including more hypertables, tables, views, or subqueries
in your SELECT query is not supported. Additionally, make sure that the
hypertable you are querying does not have
row-level-security policies
enabled.
- Query the continuous aggregate for a specific deviceSELECT *FROM conditions_summary_hourlyWHERE device = 5AND bucket >= '2020-01-01'AND bucket < '2020-04-01';
- Query for largest metric spreadsSELECT *FROM conditions_summary_hourlyWHERE max - min > 1800AND bucket >= '2020-01-01' AND bucket < '2020-04-01'ORDER BY bucket DESC, device DESC LIMIT 20;
Use continuous aggregates with mutable functions: experimental
Section titled “Use continuous aggregates with mutable functions: experimental”Mutable functions have experimental support in the continuous aggregate query definition. Mutable functions are enabled by default. However, if you use them in a materialized query a warning is returned.
When using non-immutable functions you have to ensure these functions produce consistent results across continuous aggregate refresh runs. For example, if a function depends on the current time zone you have to ensure all your continuous aggregate refreshes run with a consistent setting for this.
Use continuous aggregates with window functions: experimental
Section titled “Use continuous aggregates with window functions: experimental”Window functions have experimental support in the continuous aggregate query definition. Window functions are disabled
by default. To enable them, set timescaledb.enable_cagg_window_functions to true.
Create a window function
Section titled “Create a window function”To use a window function in a continuous aggregate:
- Create a simple tableCREATE TABLE example (time TIMESTAMPTZ NOT NULL,value TEXT NOT NULL);
- Enable window functions
As window functions are experimental, you have to set
enable_cagg_window_functions.SET timescaledb.enable_cagg_window_functions TO TRUE; - Create the continuous aggregate with a window function
Window functions must stay within the time bucket. Any query that tries to look beyond the current time bucket will produce incorrect results around the refresh boundaries.
CREATE MATERIALIZED VIEW example_aggregateWITH (timescaledb.continuous) ASSELECTtime_bucket('1d', time),customer_id,sum(amount) AS amount,sum(amount) - LAG(sum(amount),1,NULL) OVER (PARTITION BY time_bucket('1d', time) ORDER BY sum(amount) DESC) AS amount_diff,ROW_NUMBER() OVER (PARTITION BY time_bucket('1d', time) ORDER BY sum(amount) DESC)FROM sales GROUP BY 1,2;Window functions that partition by time_bucket should be safe even with LAG()/LEAD()
Window function workaround for older versions of TimescaleDB
Section titled “Window function workaround for older versions of TimescaleDB”For TimescaleDB v2.19.3 and below, continuous aggregates do not support window functions. To work around this:
- Create a simple table
Create a simple table to store a value at a specific time:
CREATE TABLE example (time TIMESTAMPTZ NOT NULL,value TEXT NOT NULL); - Create a continuous aggregate without a window functionCREATE MATERIALIZED VIEW example_aggregateWITH (timescaledb.continuous) ASSELECTtime_bucket('10 minutes', time) AS bucket,first(value, time) AS valueFROM example GROUP BY bucket;
- Use the lag window function at query time
This speeds up your query by calculating the aggregation ahead of time. The delta is calculated at query time.
SELECTbucket,value - lag(value, 1) OVER (ORDER BY bucket) AS deltaFROM example_aggregate;