This Tech Bite was brought to you by Kristina Kraljević, Junior Software Engineer at Atlantbh.

Tech Bites are tips, tricks, snippets or explanations about various programming technologies and paradigms, which can help engineers with their everyday job.

Bucket Selector Aggregation

Elasticsearch has many aggregations to offer that allow us to group and breakdown our data in almost any way we want. Often, the results of an aggregation have to be filtered based on different criteria that are calculated using that very same aggregation and then processed further. Sounds difficult? No worries, Elasticsearch has got that covered too. A pipeline aggregation called Bucket Selector comes to the rescue when filtering based on count, sum, or any other numeric value. 

First, Bucket Aggregation has to be set-up for grouping documents into buckets, where each bucket is created for documents with a unique value of the given field. Then pipelining gathered buckets to Bucket Selector is the next step, where retrieving a certain subset of buckets is the main goal. 

Example – Orders Index

For this example, a simple orders index with mappings shown below will be used. 

Mappings for this index produces the following response to the request GET orders/_mappings.

{
  "orders" : {
    "mappings" : {
      "properties" : {
        "created_on" : {"type" : "date"},
        "order_id" : 	{"type" : "long"},
        "order_date" : {"type" : "date"},
        "products" : {
          "properties" : {
            "product_id" : {"type" : "long"},
            "product_name" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "price" : {"type" : "float"}
          }
        },
        "customer_id" : {"type" : "long"},
        "customer_full_name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "shipping_country_code" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "shipping_street" : {"type" : "text"},
        "taxful_total_price" : {"type" : "float"},
        "taxless_total_price" : {"type" : "float"},
        ...
      }
    }
  }

Let’s group some monthly orders into shipments – For each month of 2019, group orders by shipping country and select shipments (2500+ orders) with a total shipment price of over 300K. 

Request

GET orders/_search
{
  "size": 0, 
  "query": {
    // Filter all orders from 1st Jan 2019 - 1st Jan 2020
    "range": {
      "order_date": {
        "gte": "2019-01-01T00:00:00.000Z",
        "lte": "2020-01-01T00:00:00.000Z"
      }
    }
  },
  "aggs": {
    "orders_per_month": {
      // Group all orders per month
      "date_histogram": {
        "field": "order_date",
        "calendar_interval": "month"
      },
      "aggs": {
        "orders": {
          // Group all orders in month per shipping country
          "terms": {
            "field": "shipping_country_code.keyword"          
          },
          "aggs": {
            "shipment_profit": {
              // Sum the prices as shipment_profit (this is per country, in a month)
              "sum": {
                "field": "taxful_total_price"
              }
            },
            "country_orders_filter": {
              // Filter current buckets (current buckets are orders per country, per month)
              "bucket_selector": {
                "buckets_path": {
                  "totalOrders": "_count",
                  "totalProfit": "shipment_profit"
                },
                "script": "params.totalOrders > 2500 && params.totalProfit > 300000"
              }
            }
          }
        }
      }
    }
  }
}

Note: Comments in the request are not part of the request, they just annotate sections of the query.

Leave a Reply