Elasticsearch Aggregations With ElasticsearchCRUD

This article shows how to implement Elasticsearch aggregation search requests and responses using ElasticsearchCRUD.

Code: https://github.com/damienbod/AggregationExamplesWithElasticsearchCRUD

The Elasticsearch Aggregation Examples contains integration tests which shows how to use all the different type of aggregations in ElasticsearchCRUD.

Other Tutorials:

Part 1: ElasticsearchCRUD introduction
Part 2: MVC application search with simple documents using autocomplete, jQuery and jTable
Part 3: MVC Elasticsearch CRUD with nested documents
Part 4: Data Transfer from MS SQL Server using Entity Framework to Elasticsearch
Part 5: MVC Elasticsearch with child, parent documents
Part 6: MVC application with Entity Framework and Elasticsearch
Part 7: Live Reindex in Elasticsearch
Part 8: CSV export using Elasticsearch and Web API
Part 9: Elasticsearch Parent, Child, Grandchild Documents and Routing
Part 10: Elasticsearch Type mappings with ElasticsearchCRUD
Part 11: Elasticsearch Synonym Analyzer using ElasticsearchCRUD
Part 12: Using Elasticsearch German Analyzer
Part 13: MVC google maps search using Elasticsearch
Part 14: Search Queries and Filters with ElasticsearchCRUD
Part 15: Elasticsearch Bulk Insert
Part 16: Elasticsearch Aggregations With ElasticsearchCRUD
Part 17: Searching Multiple Indices and Types in Elasticsearch
Part 18: MVC searching with Elasticsearch Highlighting
Part 19: Index Warmers with ElasticsearchCRUD

Elasticsearch Aggregrations

The Elasticsearch aggregation API allows you to summarize, calculate, group your data in near real time or on the fly. These aggregations can implement sub-aggregations which can again implement more sub-aggregations as you require. This allows for a very flexible API. ElasticsearchCRUD supports the following aggregations:

Min Aggregation, Max Aggregation, Sum Aggregation, Avg Aggregation, Stats Aggregation, Extended Stats Aggregation, Value Count Aggregation, Percentiles Aggregation, Percentile Ranks Aggregation, Cardinality Aggregation, Geo Bounds Aggregation, Top hits Aggregation, Scripted Metric Aggregation, Global Aggregation, Filter Aggregation, Filters Aggregation, Filters Named Aggregation, Missing Aggregation, Nested Aggregation, Reverse nested Aggregation, Children Aggregation, Terms Aggregation, Significant Terms Aggregation, Range Aggregation, Date Range Aggregation, Histogram Aggregation, Date Histogram Aggregation, Geo Distance Aggregation, GeoHash grid Aggregation

These aggregations can be split into metric and bucket aggregations. Both aggregation types can be single-valued or multi-valued aggregations. Bucket aggregations can contain sub-aggregations.

Metric Aggregrations
– Single Value Metric
– Multi Value Metric

Bucket Aggregrations
– Single Bucket Aggregations
– Multi Bucket Aggregations

Example of a Terms Bucket Aggregation

The terms bucket aggregation is a multi-valued aggregation based on a single field. The following example creates an aggregation called testFirstName and uses the firstname field from the person type in the persons index. See the example here to setup the Elasticsearch persons index.

The SeachType is set to type count so that no hits are returned in the search request. The results of the aggregation search can be converted from a Json object directly back into a TermsBucketAggregationsResult class using the testFirstName name of the aggregation.

TermsBucketAggregationsResult aggResult;
var search = new Search
{
	Aggs = new List<IAggs>
	{
		new TermsBucketAggregation("testFirstName", "firstname")
		{
			Size = 20
		}
	}
};

using (var context = new ElasticsearchContext(ConnectionString, ElasticsearchMappingResolver))
{
	var items = context.Search<Person>(
		search, 
		new SearchUrlParameters 
		{ 
			SeachType = SeachType.count 
		});
		
	aggResult = 
		items.PayloadResult.Aggregations.GetComplexValue<TermsBucketAggregationsResult>("testFirstName");
}

The request is sent as follows:

POST http://localhost:9200/persons/person/_search?&search_type=count HTTP/1.1
Content-Type: application/json
Host: localhost:9200
Content-Length: 68
Expect: 100-continue
Connection: Keep-Alive

{
	"aggs": {
		"testFirstName": {
			"terms": {
				"field": "firstname",
				"size": 20
			}
		}
	}
}

The result is

HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 875

{
	"took": 3,
	"timed_out": false,
	"_shards": {
		"total": 5,
		"successful": 5,
		"failed": 0
	},
	"hits": {
		"total": 19972,
		"max_score": 0.0,
		"hits": []
	},
	"aggregations": {
		"testFirstName": {
			"doc_count_error_upper_bound": 50,
			"sum_other_doc_count": 18159,
			"buckets": [{
				"key": "katherine",
				"doc_count": 99
			},
			{
				"key": "james",
				"doc_count": 97
			},
			{
				"key": "marcus",
				"doc_count": 97
			},
			{
				"key": "alexandra",
				"doc_count": 93
			},
			{
				"key": "dalton",
				"doc_count": 93
			},
			{
				"key": "lucas",
				"doc_count": 93
			},
			{
				"key": "morgan",
				"doc_count": 93
			},
			{
				"key": "richard",
				"doc_count": 93
			},
			{
				"key": "isabella",
				"doc_count": 92
			},
			{
				"key": "seth",
				"doc_count": 92
			},
			{
				"key": "natalie",
				"doc_count": 91
			},
			{
				"key": "eduardo",
				"doc_count": 90
			},
			{
				"key": "kaitlyn",
				"doc_count": 90
			},
			{
				"key": "robert",
				"doc_count": 90
			},
			{
				"key": "sydney",
				"doc_count": 90
			},
			{
				"key": "ian",
				"doc_count": 89
			},
			{
				"key": "julia",
				"doc_count": 89
			},
			{
				"key": "chloe",
				"doc_count": 88
			},
			{
				"key": "xavier",
				"doc_count": 88
			},
			{
				"key": "david",
				"doc_count": 87
			}]
		}
	}
}

The Hit values can also be added to the Terms Bucket aggregation as a sub-aggregation. In the following example, the sub-aggregations in the TermsBucketAggregation class contains a single TopHitsMetricAggregation aggregation.

var search = new Search
{
	Aggs = new List<IAggs>
	{
		new TermsBucketAggregation("testLastName", "lastname")
		{
			Size = 5,
			Aggs = new List<IAggs>
			{
				new TopHitsMetricAggregation("tophits")
				{
					Size = 2
				}
			}
		}
	}
};

The above code is sent to Elasticsearch as follows:

POST http://localhost:9200/persons/person/_search?&search_type=count HTTP/1.1
Content-Type: application/json
Host: localhost:9200
Content-Length: 108
Expect: 100-continue

{
	"aggs": {
		"testLastName": {
			"terms": {
				"field": "lastname",
				"size": 5
			},
			"aggs": {
				"tophits": {
					"top_hits": {
						"size": 2
					}
				}
			}
		}
	}
}

The hits for each bucket can be accessed using the TopHitsMetricAggregationsResult class. The tophits name of the aggregation is what was configured in the search request.

var hits = childbucket.GetSubAggregationsFromJTokenName<TopHitsMetricAggregationsResult<Person>>("tophits");

A multi-bucket aggregation could also be added as a sub-aggregation. Here’s an example which adds a SignificantTermsBucketAggregation to a TermsBucketAggregation. This finds all the persons with the same firstname and lastname. The SignificantTermsBucketAggregation contains a Top Hits sub-aggregation.

var search = new Search
{
	Aggs = new List<IAggs>
	{
		new TermsBucketAggregation("testLastName", "lastname")
		{
			Size = 0,
			Aggs = new List<IAggs>
			{
				new SignificantTermsBucketAggregation("testFirstName", "firstname")
				{
					Size = 20,
					Aggs = new List<IAggs>
					{
						new TopHitsMetricAggregation("tophits")
						{
							Size = 20
						}
					}
				}
			}
		}
	}
};

The request is sent as follows:

POST http://localhost:9200/persons/person/_search?&search_type=count HTTP/1.1
Content-Type: application/json
Host: localhost:9200
Content-Length: 188
Expect: 100-continue

{
	"aggs": {
		"testLastName": {
			"terms": {
				"field": "lastname",
				"size": 0
			},
			"aggs": {
				"testFirstName": {
					"significant_terms": {
						"field": "firstname",
						"size": 20
					},
					"aggs": {
						"tophits": {
							"top_hits": {
								"size": 20
							}
						}
					}
				}
			}
		}
	}
}

The results could be displayed in a console application as follows:

// Lets display the Aggregation results in the console
foreach (var bucket in aggResult.Buckets)
{
	var significantTermsBucketAggregationsResult = bucket.GetSubAggregationsFromJTokenName<SignificantTermsBucketAggregationsResult>("testFirstName");

	foreach (var childbucket in significantTermsBucketAggregationsResult.Buckets)
	{
		bool writeHeader = true;
		var hits = childbucket.GetSubAggregationsFromJTokenName<TopHitsMetricAggregationsResult<Person>>("tophits");
		foreach (var hit in hits.Hits.HitsResult)
		{
			if (writeHeader)
			{
				Console.Write("\n{0} {1}, Found Ids: ", hit.Source.FirstName, hit.Source.LastName);
			}
			Console.Write("{0} ", hit.Id);
			writeHeader = false;
		}
	}
}

Example of a ExtendedStatsMetricAggregation with a DateRangeBucketAggregation

This example shows how to get the extended statistics of the documents for the whole index and also pro year using the DateRangeBucketAggregation. The DateRangeBucketAggregation contains a sub-aggregation with one extended stats multi-valued metric aggregation.

var search = new Search
{
	Aggs = new List<IAggs>
	{
		new ExtendedStatsMetricAggregation("stats", "modifieddate"),
		new DateRangeBucketAggregation("testRangesBucketAggregation", "modifieddate", "MM-yyy", new List<RangeAggregationParameter<string>>
		{
			new ToRangeAggregationParameter<string>("now-10y/y"),
			new ToFromRangeAggregationParameter<string>("now-8y/y", "now-9y/y"),
			new ToFromRangeAggregationParameter<string>("now-7y/y", "now-8y/y"),
			new ToFromRangeAggregationParameter<string>("now-6y/y", "now-7y/y"),
			new ToFromRangeAggregationParameter<string>("now-5y/y", "now-6y/y"),
			new FromRangeAggregationParameter<string>("now-5y/y")
		})
		{
			Aggs = new List<IAggs>
			{
				new ExtendedStatsMetricAggregation("stats", "modifieddate")
			} 
		}
	}
};

The request is sent to Elasticsearch as:

POST http://localhost:9200/persons/person/_search?&search_type=count HTTP/1.1
Content-Type: application/json
Host: localhost:9200
Content-Length: 405
Expect: 100-continue
Connection: Keep-Alive

{
	"aggs": {
		"stats": {
			"extended_stats": {
				"field": "modifieddate"
			}
		},
		"testRangesBucketAggregation": {
			"date_range": {
				"field": "modifieddate",
				"format": "MM-yyy",
				"ranges": [{
					"to": "now-10y/y"
				},
				{
					"to": "now-8y/y",
					"from": "now-9y/y"
				},
				{
					"to": "now-7y/y",
					"from": "now-8y/y"
				},
				{
					"to": "now-6y/y",
					"from": "now-7y/y"
				},
				{
					"to": "now-5y/y",
					"from": "now-6y/y"
				},
				{
					"from": "now-5y/y"
				}]
			},
			"aggs": {
				"stats": {
					"extended_stats": {
						"field": "modifieddate"
					}
				}
			}
		}
	}
}

This returns the results with one global stats for the whole index and one per date range as a sub-aggregration.

HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 2280

{
	"took": 16,
	"timed_out": false,
	"_shards": {
		"total": 5,
		"successful": 5,
		"failed": 0
	},
	"hits": {
		"total": 19972,
		"max_score": 0.0,
		"hits": []
	},
	"aggregations": {
		"stats": {
			"count": 19972,
			"min": 9.643968E11,
			"max": 1.242491613123E12,
			"avg": 1.1821553155683428E12,
			"sum": 2.3610005962530944E16,
			"sum_of_squares": 2.793293745713814E28,
			"variance": 1.1137296180611115E21,
			"std_deviation": 3.3372587823857944E10
		},
		"testRangesBucketAggregation": {
			"buckets": [{
				"key": "*-01-2005",
				"to": 1.1045376E12,
				"to_as_string": "01-2005",
				"doc_count": 527,
				"stats": {
					"count": 527,
					"min": 9.643968E11,
					"max": 1.1043648E12,
					"avg": 1.0483479256166982E12,
					"sum": 5.524793568E14,
					"sum_of_squares": 5.79347110411684E26,
					"variance": 2.97007142991014E20,
					"std_deviation": 1.7233895177556755E10
				}
			},
			{
				"key": "01-2006-01-2007",
				"from": 1.1360736E12,
				"from_as_string": "01-2006",
				"to": 1.1676096E12,
				"to_as_string": "01-2007",
				"doc_count": 3071,
				"stats": {
					"count": 3071,
					"min": 1.1360736E12,
					"max": 1.1675232E12,
					"avg": 1.1524215434711821E12,
					"sum": 3.53908656E15,
					"sum_of_squares": 4.0787668666288754E27,
					"variance": 8.051796664238183E19,
					"std_deviation": 8.97318040843835E9
				}
			},
			{
				"key": "01-2007-01-2008",
				"from": 1.1676096E12,
				"from_as_string": "01-2007",
				"to": 1.1991456E12,
				"to_as_string": "01-2008",
				"doc_count": 7958,
				"stats": {
					"count": 7958,
					"min": 1.1676096E12,
					"max": 1.1990592E12,
					"avg": 1.188431685147022E12,
					"sum": 9.4575393504E15,
					"sum_of_squares": 1.1240140107520037E28,
					"variance": 6.291530282695307E19,
					"std_deviation": 7.931916718357113E9
				}
			},
			{
				"key": "01-2008-01-2009",
				"from": 1.1991456E12,
				"from_as_string": "01-2008",
				"to": 1.230768E12,
				"to_as_string": "01-2009",
				"doc_count": 7101,
				"stats": {
					"count": 7101,
					"min": 1.1991456E12,
					"max": 1.2174624E12,
					"avg": 1.207894813688213E12,
					"sum": 8.577261072E15,
					"sum_of_squares": 1.0360606565306019E28,
					"variance": 2.49825077336829E19,
					"std_deviation": 4.9982504672818165E9
				}
			},
			{
				"key": "01-2009-01-2010",
				"from": 1.230768E12,
				"from_as_string": "01-2009",
				"to": 1.262304E12,
				"to_as_string": "01-2010",
				"doc_count": 10,
				"stats": {
					"count": 10,
					"min": 1.24249161306E12,
					"max": 1.242491613123E12,
					"avg": 1.2424916130944E12,
					"sum": 1.2424916130944E13,
					"sum_of_squares": 1.543785408609924E25,
					"variance": 0.0,
					"std_deviation": 0.0
				}
			},
			{
				"key": "01-2010-*",
				"from": 1.262304E12,
				"from_as_string": "01-2010",
				"doc_count": 0,
				"stats": {
					"count": 0,
					"min": null,
					"max": null,
					"avg": null,
					"sum": null,
					"sum_of_squares": null,
					"variance": null,
					"std_deviation": null
				}
			}]
		}
	}
}

A result class is provided for each aggregation type in Elasticsearch, so you do not need to create your own result DTOs unless required. Any class can be used to get the data. The JToken which contains the result of the aggregation is public which can also be used, if preferred.

All other aggregation code examples can be found here:
Elasticsearch Aggregation Examples

If you find any bugs or have any improvement suggestions, I would be grateful for feedback.

Links:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations.html

https://www.found.no/foundation/elasticsearch-aggregations/

http://blog.qbox.io/elasticsearch-aggregations

http://chrissimpson.co.uk/elasticsearch-aggregations-overview.html

http://zaiste.net/2014/06/concisely_about_aggregations_in_elasticsearch/

http://seanmcgary.com/posts/elasticsearch-date-histogram-aggregation—filling-in-the-empty-buckets

http://obtao.com/blog/2014/10/use-aggregations-statistics-symfony-elasticsearch/

http://www.gridshore.nl/2014/07/25/playing-with-two-most-interesting-new-features-of-elasticsearch-1-3-0/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: