Elasticsearch provides a simple way to execute all kinds of complex aggregations. Often, the result of those aggregations contains a large number of buckets, so it would be good to paginate those buckets efficiently. That’s where Composite aggregation comes in handy. 

Composite aggregation

Composite aggregation provides a way to stream all buckets of specific aggregation, similar to what scroll does for documents. The composite buckets are built from the combinations of the values extracted/created for each document, and each combination is considered as a composite bucket. 

If the number of composite buckets is too high (or unknown) to be returned in a single response, it is possible to split the retrieval into multiple requests. The size parameter defines the number of composite buckets that will be returned in the response. Also, the after parameter is used to retrieve the next results.

Example – Pay Reports index

Let’s see how Composite aggregation can be used to paginate aggregation buckets. A simple pay reports index will be used as an example. Exemplary data is given in the table below (table representation of index data is used because of readability). 

report_id report_date employee_id employee_full_name net_pay employer_id employer_name city
1. 1 2022-05-27 1 John Collins 4,500.00 1 McDonalds Atlanta, Georgia
2. 2 2022-05-27 2 Mary Jane 5,000.0 1 McDonalds San Fancisco, California
3. 3 2022-05-28 3 Anthony Smith 3,000.0 2 Starbucks Detroit, Michigan

Now, let’s get the average monthly net pay per city. An example query is given below.

GET pay_reports/_search
{
  "size": 0,
  "aggs": {
    "avg_net_pay_by_city_per_month": {
      "date_histogram": {
        "field": "report_date",
        "interval": "month"
      },
      "aggs": {
        "cities": {
          "composite": {
            "size": 20,
            "sources": [
              {
                "city": {
                  "terms": {
                    "field": "city.keyword",
                    "order": "desc”
                  }
                }
              }
            ]
          },
          "aggs": {
            "avg_net_pay": {
              "avg": {
                "field": "net_pay"
              }
            }
          }
        }
      }
    }
  }
}

Now to get the next set of buckets, resend the same aggregation with the after parameter set to the after_keyvalue returned in the response. For example, if the after_keyvalue of the initial response were “Atlanta, Georgia”, the new composite aggregation would look like in the example below.

GET pay_reports/_search
{
  …
  "composite": {
    "size": 20,
    "after": {
      "city": "Atlanta, Georgia"
    },
    "sources": [
      {
        "city": {
          "terms": {
            "field": "city.keyword",
            "order": "desc"
          }
        }
      }
    ]
  },
  …
}

As shown above, composite aggregations are fairly simple to use and provide a great way to paginate over aggregations, especially when those aggregations have a large number of buckets. However, one thing to keep in mind is that composite aggregations are incompatible with pipeline aggregations, so adding composite aggregations to a more complex aggregation can be tricky.


“Elasticsearch – Aggregation pagination with Composite aggregation” Tech Bite was brought to you by Dino Kopić, Junior Software Engineer at Atlantbh.

Tech Bites are tips, tricks, snippets or explanations about various programming technologies and paradigms, which can help engineers with their everyday job.

oban
Software DevelopmentTech Bites
February 23, 2024

Background Jobs in Elixir – Oban

When and why do we need background jobs? Nowadays, background job processing is indispensable in the world of web development. The need for background jobs stems from the fact that synchronous execution of time-consuming and resource-intensive tasks would heavily impact an application's  performance and user experience.  Even though Elixir is…
selenium
QA/Test AutomationTech Bites
December 22, 2023

Selenium Grid 4 with Docker

Introduction When talking about automation testing, one of the first things that comes to mind is Selenium. Selenium is a free, open-source automated testing framework used to validate web applications across different browsers and platforms. It is not just a single tool but a suite of software. Every component of…

Want to discuss this in relation to your project? Get in touch:

Leave a Reply