Bucket Selector Aggregation

Elasticsearch has many aggregations to offer that allow us to group and breakdown our data in almost any way we want. Often, the results of an aggregation have to be filtered based on different criteria that are calculated using that very same aggregation and then processed further. Sounds difficult? No worries, Elasticsearch has got that covered too. A pipeline aggregation called Bucket Selector comes to the rescue when filtering based on count, sum, or any other numeric value. 

First, Bucket Aggregation has to be set-up for grouping documents into buckets, where each bucket is created for documents with a unique value of the given field. Then pipelining gathered buckets to Bucket Selector is the next step, where retrieving a certain subset of buckets is the main goal. 

Example – Orders Index

For this example, a simple orders index with mappings shown below will be used. 

Mappings for this index produces the following response to the request GET orders/_mappings.

{
  "orders" : {
    "mappings" : {
      "properties" : {
        "created_on" : {"type" : "date"},
        "order_id" : 	{"type" : "long"},
        "order_date" : {"type" : "date"},
        "products" : {
          "properties" : {
            "product_id" : {"type" : "long"},
            "product_name" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "price" : {"type" : "float"}
          }
        },
        "customer_id" : {"type" : "long"},
        "customer_full_name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "shipping_country_code" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "shipping_street" : {"type" : "text"},
        "taxful_total_price" : {"type" : "float"},
        "taxless_total_price" : {"type" : "float"},
        ...
      }
    }
  }

Let’s group some monthly orders into shipments – For each month of 2019, group orders by shipping country and select shipments (2500+ orders) with a total shipment price of over 300K. 

Request

GET orders/_search
{
  "size": 0, 
  "query": {
    // Filter all orders from 1st Jan 2019 - 1st Jan 2020
    "range": {
      "order_date": {
        "gte": "2019-01-01T00:00:00.000Z",
        "lte": "2020-01-01T00:00:00.000Z"
      }
    }
  },
  "aggs": {
    "orders_per_month": {
      // Group all orders per month
      "date_histogram": {
        "field": "order_date",
        "calendar_interval": "month"
      },
      "aggs": {
        "orders": {
          // Group all orders in month per shipping country
          "terms": {
            "field": "shipping_country_code.keyword"          
          },
          "aggs": {
            "shipment_profit": {
              // Sum the prices as shipment_profit (this is per country, in a month)
              "sum": {
                "field": "taxful_total_price"
              }
            },
            "country_orders_filter": {
              // Filter current buckets (current buckets are orders per country, per month)
              "bucket_selector": {
                "buckets_path": {
                  "totalOrders": "_count",
                  "totalProfit": "shipment_profit"
                },
                "script": "params.totalOrders > 2500 && params.totalProfit > 300000"
              }
            }
          }
        }
      }
    }
  }
}

Note: Comments in the request are not part of the request, they just annotate sections of the query.


“Elasticsearch – Bucket Selector Aggregation” Tech Bite was brought to you by Kristina Kraljević, Junior Software Engineer at Atlantbh.

Tech Bites are tips, tricks, snippets or explanations about various programming technologies and paradigms, which can help engineers with their everyday job.

Protractor parallel execution
QA/Test AutomationTech Bites
May 12, 2023

Protractor parallel execution

Why Parallel Testing? When designing automation test suites, the ability to run tests in parallel is a key feature because of the following benefits: Execution time - Increasing tests efficiency and, finally, faster releases Device compatibility - There are many devices with various versions and specifications that need to be…
Introduction to GraphQL
QA/Test AutomationTech Bites
May 11, 2023

Introduction to GraphQL

In today's Tech Bite topic, we will get familiar with GraphQL and explain its growth in popularity in recent years as the new standard for working with APIs. What is GraphQL? GraphQL stands for Graph Query Language. It is an API specification standard and, as its name says - a…
IQR in Automation Testing
QA/Test AutomationTech Bites
April 25, 2023

IQR in Automation Testing: Unleashing the Data Analytics Potential

Usually, when people talk about statistics, they think only about numbers, but it is much more.  Statistics is an irreplaceable tool in different fields like data analysis, predictive modeling, quality control, scientific research, decision-making models, etc. Overall, statistics provide a valuable toolset for understanding, describing, and making predictions about data,…

Leave a Reply