Elasticsearch provides a simple way to execute all kinds of complex aggregations. Often, the result of those aggregations contains a large number of buckets, so it would be good to paginate those buckets efficiently. That’s where Composite aggregation comes in handy.
Composite aggregation
Composite aggregation provides a way to stream all buckets of specific aggregation, similar to what scroll does for documents. The composite buckets are built from the combinations of the values extracted/created for each document, and each combination is considered as a composite bucket.
If the number of composite buckets is too high (or unknown) to be returned in a single response, it is possible to split the retrieval into multiple requests. The size parameter defines the number of composite buckets that will be returned in the response. Also, the after parameter is used to retrieve the next results.
Example – Pay Reports index
Let’s see how Composite aggregation can be used to paginate aggregation buckets. A simple pay reports index will be used as an example. Exemplary data is given in the table below (table representation of index data is used because of readability).
report_id | report_date | employee_id | employee_full_name | net_pay | employer_id | employer_name | city | |
1. | 1 | 2022-05-27 | 1 | John Collins | 4,500.00 | 1 | McDonalds | Atlanta, Georgia |
2. | 2 | 2022-05-27 | 2 | Mary Jane | 5,000.0 | 1 | McDonalds | San Fancisco, California |
3. | 3 | 2022-05-28 | 3 | Anthony Smith | 3,000.0 | 2 | Starbucks | Detroit, Michigan |
Now, let’s get the average monthly net pay per city. An example query is given below.
GET pay_reports/_search { "size": 0, "aggs": { "avg_net_pay_by_city_per_month": { "date_histogram": { "field": "report_date", "interval": "month" }, "aggs": { "cities": { "composite": { "size": 20, "sources": [ { "city": { "terms": { "field": "city.keyword", "order": "desc” } } } ] }, "aggs": { "avg_net_pay": { "avg": { "field": "net_pay" } } } } } } } }
Now to get the next set of buckets, resend the same aggregation with the after parameter set to the after_key
value returned in the response. For example, if the after_key
value of the initial response were “Atlanta, Georgia”, the new composite aggregation would look like in the example below.
GET pay_reports/_search { … "composite": { "size": 20, "after": { "city": "Atlanta, Georgia" }, "sources": [ { "city": { "terms": { "field": "city.keyword", "order": "desc" } } } ] }, … }
As shown above, composite aggregations are fairly simple to use and provide a great way to paginate over aggregations, especially when those aggregations have a large number of buckets. However, one thing to keep in mind is that composite aggregations are incompatible with pipeline aggregations, so adding composite aggregations to a more complex aggregation can be tricky.
“Elasticsearch – Aggregation pagination with Composite aggregation” Tech Bite was brought to you by Dino Kopić, Junior Software Engineer at Atlantbh.
Tech Bites are tips, tricks, snippets or explanations about various programming technologies and paradigms, which can help engineers with their everyday job.