Bucket Selector Aggregation

Elasticsearch has many aggregations to offer that allow us to group and breakdown our data in almost any way we want. Often, the results of an aggregation have to be filtered based on different criteria that are calculated using that very same aggregation and then processed further. Sounds difficult? No worries, Elasticsearch has got that covered too. A pipeline aggregation called Bucket Selector comes to the rescue when filtering based on count, sum, or any other numeric value. 

First, Bucket Aggregation has to be set-up for grouping documents into buckets, where each bucket is created for documents with a unique value of the given field. Then pipelining gathered buckets to Bucket Selector is the next step, where retrieving a certain subset of buckets is the main goal. 

Example – Orders Index

For this example, a simple orders index with mappings shown below will be used. 

Mappings for this index produces the following response to the request GET orders/_mappings.

{
  "orders" : {
    "mappings" : {
      "properties" : {
        "created_on" : {"type" : "date"},
        "order_id" : 	{"type" : "long"},
        "order_date" : {"type" : "date"},
        "products" : {
          "properties" : {
            "product_id" : {"type" : "long"},
            "product_name" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "price" : {"type" : "float"}
          }
        },
        "customer_id" : {"type" : "long"},
        "customer_full_name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "shipping_country_code" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "shipping_street" : {"type" : "text"},
        "taxful_total_price" : {"type" : "float"},
        "taxless_total_price" : {"type" : "float"},
        ...
      }
    }
  }

Let’s group some monthly orders into shipments – For each month of 2019, group orders by shipping country and select shipments (2500+ orders) with a total shipment price of over 300K. 

Request

GET orders/_search
{
  "size": 0, 
  "query": {
    // Filter all orders from 1st Jan 2019 - 1st Jan 2020
    "range": {
      "order_date": {
        "gte": "2019-01-01T00:00:00.000Z",
        "lte": "2020-01-01T00:00:00.000Z"
      }
    }
  },
  "aggs": {
    "orders_per_month": {
      // Group all orders per month
      "date_histogram": {
        "field": "order_date",
        "calendar_interval": "month"
      },
      "aggs": {
        "orders": {
          // Group all orders in month per shipping country
          "terms": {
            "field": "shipping_country_code.keyword"          
          },
          "aggs": {
            "shipment_profit": {
              // Sum the prices as shipment_profit (this is per country, in a month)
              "sum": {
                "field": "taxful_total_price"
              }
            },
            "country_orders_filter": {
              // Filter current buckets (current buckets are orders per country, per month)
              "bucket_selector": {
                "buckets_path": {
                  "totalOrders": "_count",
                  "totalProfit": "shipment_profit"
                },
                "script": "params.totalOrders > 2500 && params.totalProfit > 300000"
              }
            }
          }
        }
      }
    }
  }
}

Note: Comments in the request are not part of the request, they just annotate sections of the query.


“Elasticsearch – Bucket Selector Aggregation” Tech Bite was brought to you by Kristina Kraljević, Junior Software Engineer at Atlantbh.

Tech Bites are tips, tricks, snippets or explanations about various programming technologies and paradigms, which can help engineers with their everyday job.

oban
Software DevelopmentTech Bites
February 23, 2024

Background Jobs in Elixir – Oban

When and why do we need background jobs? Nowadays, background job processing is indispensable in the world of web development. The need for background jobs stems from the fact that synchronous execution of time-consuming and resource-intensive tasks would heavily impact an application's  performance and user experience.  Even though Elixir is…
selenium
QA/Test AutomationTech Bites
December 22, 2023

Selenium Grid 4 with Docker

Introduction When talking about automation testing, one of the first things that comes to mind is Selenium. Selenium is a free, open-source automated testing framework used to validate web applications across different browsers and platforms. It is not just a single tool but a suite of software. Every component of…

Want to discuss this in relation to your project? Get in touch:

Leave a Reply