Statistical and regression analysis overview
Functions for statistical analysis and linear regression on time-series data
Perform statistical analysis and linear regression on time-series data. These functions are similar to PostgreSQL statistical aggregates, but they include more features and are easier to use in continuous aggregates and window functions.
Two-step aggregation
Section titled “Two-step aggregation”This group of functions uses the two-step aggregation pattern.
Rather than calculating the final result in one step, you first create an intermediate aggregate by using the aggregate function.
Then, use any of the accessors on the intermediate aggregate to calculate a final result. You can also roll up multiple intermediate aggregates with the rollup functions.
The two-step aggregation pattern has several advantages:
- More efficient because multiple accessors can reuse the same aggregate
- Easier to reason about performance, because aggregation is separate from final computation
- Easier to understand when calculations can be rolled up into larger intervals, especially in window functions and continuous aggregates
- Perform retrospective analysis even when underlying data is dropped, because the intermediate aggregate stores extra information not available in the final result
To learn more, see the blog post on two-step aggregates.
Samples
Section titled “Samples”One-dimensional statistical analysis
Section titled “One-dimensional statistical analysis”Calculate the average, standard deviation, and skewness of daily temperature readings:
WITH daily_stats AS ( SELECT time_bucket('1 day'::interval, time) AS day, stats_agg(temperature) AS stats FROM weather_data GROUP BY day)SELECT day, average(stats) AS avg_temp, stddev(stats) AS std_dev, skewness(stats) AS skewFROM daily_statsORDER BY day;Two-dimensional regression analysis
Section titled “Two-dimensional regression analysis”Calculate the correlation coefficient and linear regression slope between two variables:
WITH daily_stats AS ( SELECT time_bucket('1 day'::interval, time) AS day, stats_agg(sales, temperature) AS stats FROM store_data GROUP BY day)SELECT day, corr(stats) AS correlation, slope(stats) AS regression_slope, intercept(stats) AS y_interceptFROM daily_statsORDER BY day;Rolling window calculations
Section titled “Rolling window calculations”Calculate a 7-day rolling average using the rolling window function:
SELECT time_bucket('1 day'::interval, time) AS day, average(rolling(stats_agg(temperature)) OVER (ORDER BY time_bucket('1 day'::interval, time) ROWS 6 PRECEDING)) AS rolling_avg_7dayFROM weather_dataGROUP BY dayORDER BY day;Available functions
Section titled “Available functions”One-dimensional statistics
Section titled “One-dimensional statistics”stats_agg() (one variable): analyze statistical properties of a single variable
Two-dimensional statistics and regression
Section titled “Two-dimensional statistics and regression”stats_agg() (two variables): analyze statistical properties and linear regression of two variables