stats_agg (two variables) overview
Statistical analysis and linear regression functions for two-dimensional data
Perform linear regression analysis, for example to calculate correlation coefficient and covariance, on two-dimensional data. You can also calculate common statistics, such as average and standard deviation, on each dimension separately. These functions are similar to the PostgreSQL statistical aggregates, but they include more features and are easier to use in continuous aggregates and window functions. The linear regressions are based on the standard least-squares fitting method.
These functions work on two-dimensional data. To work with one-dimensional data, for example to calculate the average
and standard deviation of a single variable, see the one-dimensional stats_agg functions.
Two-step aggregation
Section titled “Two-step aggregation”This group of functions uses the two-step aggregation pattern.
Rather than calculating the final result in one step, you first create an intermediate aggregate by using the aggregate function.
Then, use any of the accessors on the intermediate aggregate to calculate a final result. You can also roll up multiple intermediate aggregates with the rollup functions.
The two-step aggregation pattern has several advantages:
- More efficient because multiple accessors can reuse the same aggregate
- Easier to reason about performance, because aggregation is separate from final computation
- Easier to understand when calculations can be rolled up into larger intervals, especially in window functions and continuous aggregates
- Perform retrospective analysis even when underlying data is dropped, because the intermediate aggregate stores extra information not available in the final result
To learn more, see the blog post on two-step aggregates.
Samples
Section titled “Samples”Calculate regression and statistical properties
Section titled “Calculate regression and statistical properties”Create a statistical aggregate that summarizes daily statistical data about two variables, val2 and val1, where
val2 is the dependent variable and val1 is the independent variable. Use the statistical aggregate to calculate the
average of the dependent variable and the slope of the linear-regression fit:
WITH t AS ( SELECT time_bucket('1 day'::interval, ts) AS dt, stats_agg(val2, val1) AS stats2D FROM foo WHERE id = 'bar' GROUP BY time_bucket('1 day'::interval, ts))SELECT average_x(stats2D), slope(stats2D)FROM t;Available functions
Section titled “Available functions”Aggregate
Section titled “Aggregate”stats_agg(): aggregate data into an intermediate statistical aggregate form for further calculation
Accessors for y variable statistics
Section titled “Accessors for y variable statistics”average_y(): calculate the average of the dependent variable from a statistical aggregatestddev_y(): calculate the standard deviation of the dependent variable from a statistical aggregatevariance_y(): calculate the variance of the dependent variable from a statistical aggregateskewness_y(): calculate the skewness of the dependent variable from a statistical aggregatekurtosis_y(): calculate the kurtosis of the dependent variable from a statistical aggregatesum_y(): calculate the sum of the dependent variable from a statistical aggregate
Accessors for regression analysis
Section titled “Accessors for regression analysis”corr(): calculate the correlation coefficient from a statistical aggregatecovariance(): calculate the covariance from a statistical aggregatedetermination_coeff(): calculate the coefficient of determination (R²) from a statistical aggregateslope(): calculate the slope of the linear regression line from a statistical aggregateintercept(): calculate the y-intercept of the linear regression line from a statistical aggregatex_intercept(): calculate the x-intercept of the linear regression line from a statistical aggregate
Accessors for aggregate information
Section titled “Accessors for aggregate information”num_vals(): get the number of values contained in a statistical aggregate
Rollup
Section titled “Rollup”rollup(): combine multiple two-dimensional statistical aggregates
Mutator
Section titled “Mutator”rolling(): create a rolling window aggregate for use in window functions