Time-weighted calculations overview
Calculate time-weighted summary statistics for unevenly sampled data
Calculate time-weighted summary statistics, such as averages (means) and integrals. Time weighting is used when data is unevenly sampled over time. In that case, a straight average gives misleading results, as it biases towards more frequently sampled values.
For example, a sensor might silently spend long periods of time in a steady state, and send data only when a significant change occurs. The regular mean counts the steady-state reading as only a single point, whereas a time-weighted mean accounts for the long period of time spent in the steady state. In essence, the time-weighted mean takes an integral over time, then divides by the elapsed time.
Two-step aggregation
Section titled “Two-step aggregation”This group of functions uses the two-step aggregation pattern.
Rather than calculating the final result in one step, you first create an intermediate aggregate by using the aggregate function.
Then, use any of the accessors on the intermediate aggregate to calculate a final result. You can also roll up multiple intermediate aggregates with the rollup functions.
The two-step aggregation pattern has several advantages:
- More efficient because multiple accessors can reuse the same aggregate
- Easier to reason about performance, because aggregation is separate from final computation
- Easier to understand when calculations can be rolled up into larger intervals, especially in window functions and continuous aggregates
- Perform retrospective analysis even when underlying data is dropped, because the intermediate aggregate stores extra information not available in the final result
To learn more, see the blog post on two-step aggregates.
Samples
Section titled “Samples”Aggregate data into a TimeWeightSummary and calculate the average
Section titled “Aggregate data into a TimeWeightSummary and calculate the average”Given a table foo with data in a column val, aggregate data into a daily TimeWeightSummary. Use that to calculate
the average for column val:
WITH t as ( SELECT time_bucket('1 day'::interval, ts) as dt, time_weight('Linear', ts, val) AS tw FROM foo WHERE measure_id = 10 GROUP BY time_bucket('1 day'::interval, ts))SELECT dt, average(tw)FROM t;Advanced usage
Section titled “Advanced usage”Parallelism and ordering
Section titled “Parallelism and ordering”Time-weighted average calculations are not strictly parallelizable, as defined by PostgreSQL. These calculations require inputs to be strictly ordered, but in general, PostgreSQL parallelizes by assigning rows randomly to workers.
However, the algorithm can be parallelized if it is guaranteed that all rows within some time range go to the same
worker. This is the case for both continuous aggregates and distributed hypertables. (Note that the partitioning keys of
the distributed hypertable must be within the GROUP BY clause, but this is usually the case.)
Combining aggregates across measurement series
Section titled “Combining aggregates across measurement series”If you try to combine overlapping TimeWeightSummaries, an error is thrown. For example, you might create a
TimeWeightSummary for device_1 and a separate TimeWeightSummary for device_2, both covering the same period of
time. You can’t combine these because the interpolation techniques only make sense when restricted to a single
measurement series.
If you want to calculate a single summary statistic across all devices, use a simple average, like this:
WITH t as (SELECT measure_id, average( time_weight('LOCF', ts, val) ) as time_weighted_average FROM foo GROUP BY measure_id)SELECT avg(time_weighted_average) -- use the normal avg function to average the time-weighted averagesFROM t;Parallelism in multi-node
Section titled “Parallelism in multi-node”The time-weighted average functions are not strictly parallelizable in the PostgreSQL sense. PostgreSQL requires that parallelizable functions accept potentially overlapping input. As explained above, the time-weighted functions do not. However, they do support partial aggregation and partition-wise aggregation in multi-node setups.
Reducing memory usage
Section titled “Reducing memory usage”Because the time-weighted aggregates require ordered sets, they build up a buffer of input data, sort it, and then perform the aggregation steps. When memory is too small to build up a buffer of points, you might see Out of Memory failures or other issues. In these cases, try using a multi-level aggregate. For example:
WITH t as (SELECT measure_id, time_bucket('1 day'::interval, ts), time_weight('LOCF', ts, val) FROM foo GROUP BY measure_id, time_bucket('1 day'::interval, ts) )SELECT measure_id, average( rollup(time_weight) )FROM tGROUP BY measure_id;Functions in this group
Section titled “Functions in this group”Aggregate
Section titled “Aggregate”time_weight(): aggregate data into an intermediate time-weighted aggregate form for further calculation
Accessors
Section titled “Accessors”average(): calculate the time-weighted average of values in a TimeWeightSummaryfirst_time(): get the timestamp of the first point in the TimeWeightSummaryfirst_val(): get the value of the first point in the TimeWeightSummaryintegral(): calculate the integral from a TimeWeightSummaryinterpolated_average(): calculate the time-weighted average, interpolating at boundariesinterpolated_integral(): calculate the integral, interpolating at boundarieslast_time(): get the timestamp of the last point in the TimeWeightSummarylast_val(): get the value of the last point in the TimeWeightSummary
Rollup
Section titled “Rollup”rollup(): combine multiple TimeWeightSummaries