Skip to content

hyperloglog()

Aggregate data into a hyperloglog for approximate counting

Since 1.3.0

This is the first step for estimating the approximate number of distinct values using the hyperloglog algorithm. Use hyperloglog to create an intermediate aggregate from your raw data. This intermediate form can then be used by one or more accessors in this group to compute final results.

Optionally, multiple such intermediate aggregate objects can be combined using rollup() before an accessor is applied.

If you’re not sure what value to set for buckets, try using the alternate aggregate function, approx_count_distinct(). approx_count_distinct also creates a hyperloglog, but it sets a default bucket value that should work for many use cases.

Given a table called samples, with a column called weights, return a hyperloglog over the weights column.

SELECT hyperloglog(32768, weights) FROM samples;

Using the same data, build a view from the aggregate that you can pass to other hyperloglog functions.

CREATE VIEW hll AS SELECT hyperloglog(32768, data) FROM samples;

The syntax is:

hyperloglog(
buckets INTEGER,
value AnyElement
) RETURNS Hyperloglog
NameTypeDefaultRequiredDescription
bucketsINTEGER-Number of buckets in the hyperloglog. Increasing the number of buckets improves accuracy but increases memory use. Value is rounded up to the next power of 2, and must be between 2^4 (16) and 2^18. Setting a value less than 2^10 (1,024) may result in poor accuracy if the true cardinality is high and is not recommended. If unsure, start experimenting with 8,192 (2^13) which has an approximate error rate of 1.15%.
valueAnyElement-The column containing the elements to count. The type must have an extended, 64-bit, hash function.
ColumnTypeDescription
hyperloglogHyperloglogA hyperloglog object which can be passed to other hyperloglog APIs for rollups and final calculation