Elasticsearch Parent, Child, Grandchild documents and Routing

This article shows how to create parent, child and grandchild documents in Elasticsearch using ElasticsearchCRUD. If creating documents which are related to each other, it is important that the documents are all saved to the same shard in Elasticsearch. The search performance is better, if a specific shard can be defined for the search.

When creating parent and child document relationships, the parent definition is enough for child documents. This ensures that the child documents are saved to the same shard. Once grandchild documents are used, a routing definition is required, otherwise that grandchild documents will not always be saved to the same shard, and all the advantages of creating child documents are lost.

Code: https://github.com/damienbod/ElasticsearchParentChildGrandChild

Other Tutorials:

Part 1: ElasticsearchCRUD introduction
Part 2: MVC application search with simple documents using autocomplete, jQuery and jTable
Part 3: MVC Elasticsearch CRUD with nested documents
Part 4: Data Transfer from MS SQL Server using Entity Framework to Elasticsearch
Part 5: MVC Elasticsearch with child, parent documents
Part 6: MVC application with Entity Framework and Elasticsearch
Part 7: Live Reindex in Elasticsearch
Part 8: CSV export using Elasticsearch and Web API
Part 9: Elasticsearch Parent, Child, Grandchild Documents and Routing
Part 10: Elasticsearch Type mappings with ElasticsearchCRUD
Part 11: Elasticsearch Synonym Analyzer using ElasticsearchCRUD
Part 12: Using Elasticsearch German Analyzer
Part 13: MVC google maps search using Elasticsearch
Part 14: Search Queries and Filters with ElasticsearchCRUD
Part 15: Elasticsearch Bulk Insert
Part 16: Elasticsearch Aggregations With ElasticsearchCRUD
Part 17: Searching Multiple Indices and Types in Elasticsearch
Part 18: MVC searching with Elasticsearch Highlighting
Part 19: Index Warmers with ElasticsearchCRUD

Step 1: Define the document models

The LeagueCup, Team and Player classes are used in this application. The LeagueCup class is the parent class. It has a list of child Team classes. The Team class has a list of child Player classes. We want to save all documents in the same index and ensure that child and grandchild documents are saved to the same shard. The child documents required the Key attribute definition so that ElasticsearchCrud knows which property is used as the _id definition.

public class LeagueCup
{
	public long Id { get; set; }
	public string Name { get; set; }
	public string Description { get; set; }
	public List<Team> Teams { get; set; }
}

public class Team
{
	[Key]
	public long Id { get; set; }
	public string Name { get; set; }
	public string Stadium { get; set; }
	public List<Player> Players { get; set; }
}

public class Player
{
	[Key]
	public long Id { get; set; }
	public string Name { get; set; }
	public int Goals { get; set; }
	public int Assists { get; set; }
	public string Position { get; set; }
	public int Age { get; set; }
}

Step 2: Create the index with the correct mappings

To create the index with the mapping, the default configuration of the context in ElasticsearchCRUD needs to be changed. The ElasticsearchSerializerConfiguration Config contains all the required configuration. We want to save each child document as a separate mapping or index type and also process all child documents for each type. The routing is also forced for child documents with UserDefinedRouting. This is not the default because if no grandchild documents are used, this is not required. The default config in Elasticsearch saves the complete child tree as nested items, processes all child items and adds no routing.

The mapping definitions are also required for the different types. Per default each type would be saved to its own index. This is changed so all types in the relationship are saved to the same index: leagues.

private static readonly IElasticsearchMappingResolver ElasticsearchMappingResolver = new ElasticsearchMappingResolver();
private const bool SaveChildObjectsAsWellAsParent = true;
private const bool ProcessChildDocumentsAsSeparateChildIndex = true;
private const bool UserDefinedRouting = true;
private static readonly ElasticsearchSerializerConfiguration Config = new ElasticsearchSerializerConfiguration(ElasticsearchMappingResolver, SaveChildObjectsAsWellAsParent,
  ProcessChildDocumentsAsSeparateChildIndex, UserDefinedRouting);

private const string ConnectionString = "http://localhost:9200";

static void Main(string[] args)
{
  // Define the mapping for the type so that all use the same index as the parent
  ElasticsearchMappingResolver.AddElasticSearchMappingForEntityType(typeof(LeagueCup), MappingUtils.GetElasticsearchMapping("leagues"));
  ElasticsearchMappingResolver.AddElasticSearchMappingForEntityType(typeof(Team), MappingUtils.GetElasticsearchMapping("leagues"));
  ElasticsearchMappingResolver.AddElasticSearchMappingForEntityType(typeof(Player), MappingUtils.GetElasticsearchMapping("leagues"));

  CreateIndexWithRouting();
			
}

The CreateIndexWithRouting method creates a new index, with three type mappings. The context.CreateIndex() does this in three different PUT requests, one per type.

private static void CreateIndexWithRouting()
{
	// Use routing for the child parent relationship. This is required if you use grandchild documents.
	// If routing ensures that the grandchild documents are saved to the same shard as the parent document.
	// --------------
	// If you use only parent and child documents, routing is not required. The child documents are saved
	// to the same shard as the parent document using the parent definition.
	// -------------- 
	// The routing definition can be defined using the configuration parameter: UserDefinedRouting in the ElasticsearchSerializerConfiguration
	//var config = new ElasticsearchSerializerConfiguration(ElasticsearchMappingResolver, SaveChildObjectsAsWellAsParent,
	//	ProcessChildDocumentsAsSeparateChildIndex, UserDefinedRouting);

	using (var context = new ElasticsearchContext(ConnectionString, Config))
	{
		context.TraceProvider = new ConsoleTraceProvider();
	
		// Create index in Elasticsearch
		// This creates a index leagues and 3 types, leaguecup, team, player
		var ret = context.CreateIndex<LeagueCup>();
	}
}

The create index with the parent mappings are sent as follows:

PUT http://localhost:9200/leagues/ HTTP/1.1
Content-Type: application/json
Host: localhost:9200
Content-Length: 192
Expect: 100-continue
Connection: Keep-Alive

{
 "settings": { 
   "number_of_shards":5,
   "number_of_replicas":1
 },
 "mappings": {
   "leaguecup": {
     "properties": { 
       "id":{ "type" : "long" },
       "name":{ "type" : "string" },
       "description":{ "type" : "string" }
      }
   }
 }
}

The first child PUT request is sent as shown below. The routing is defined only with a required property. No other options are required because if a property is used, the following requests are sent to Elasticsearch and then re-routed and this causes a performance lost.

PUT http://localhost:9200/leagues/team/_mappings HTTP/1.1
Content-Type: application/json
Host: localhost:9200
Content-Length: 174
Expect: 100-continue

{
 "team": {
  "_parent": {
     "type":"leaguecup"
  },
  "_routing": {
    "required":"true"
  },
  "properties": {
    "id": { "type" : "long" },
    "name":{ "type" : "string" },
    "stadium":{ "type" : "string" }
  }
 }
}

The grandchild mapping PUT request is sent as follows:

PUT http://localhost:9200/leagues/player/_mappings HTTP/1.1
Content-Type: application/json
Host: localhost:9200
Content-Length: 265
Expect: 100-continue

{ 
 "player": {
   "_parent":{"type":"team"},
   "_routing":{"required":"true"},
   "properties":{"id":{ "type" : "long" },
     "name":{ "type" : "string" },
     "goals":{ "type" : "integer" },
     "assists":{ "type" : "integer" },
     "position":{ "type" : "string" },
     "age":{ "type" : "integer" }
   }
  }
}

Step 3: Add a LeagueCup document

Now that the index and the type mappings exist, a new leagueCup document can be added.

private static long CreateNewLeague()
{
	var swissCup = new LeagueCup {Description = "Nataional Cup Switzerland", Id = 1, Name = "Swiss Cup"};

	using (var context = new ElasticsearchContext(ConnectionString, Config))
	{
		context.TraceProvider = new ConsoleTraceProvider();
		context.AddUpdateDocument(swissCup, swissCup.Id);
		context.SaveChanges();
	}

	return swissCup.Id;
}

The add document request is sent as part of a bulk request. ElasticsearchCRUD sends all add, update and delete requests in a bulk request. The different requests can then be optimized into a single request. The context.SaveChanges() sends all pending requests.

POST http://localhost:9200/_bulk HTTP/1.1
Content-Type: application/json
Host: localhost:9200
Content-Length: 131
Expect: 100-continue

{"index":{"_index":"leagues","_type":"leaguecup","_id":"1"}}
{"id":1,"name":"Swiss Cup","description":"Nataional Cup Switzerland"}

Step 4: Add a Team document

The team request is sent using the parent id from the parent LeagueCup.

/// <summary>
/// The parentId is the id of the parent object
/// The routing Id is required for Elasticsearch to force that all child objects are saved to the same shard. This is good for performance.
/// As this is a first level child, the routingId and the parentId are the same.
/// </summary>
private static long AddTeamToCup(long leagueId)
{
	var youngBoys = new Team {Id=2,Name="Young Boys", Stadium="Wankdorf Bern"};

	using (var context = new ElasticsearchContext(ConnectionString, Config))
	{
		context.TraceProvider = new ConsoleTraceProvider();
		context.AddUpdateDocument(youngBoys, youngBoys.Id, new RoutingDefinition { ParentId = leagueId, RoutingId = leagueId });
		context.SaveChanges();
	}

	return youngBoys.Id;
}

This request uses the the parent Id and also the routing Id. Because the document is a first level child, the two ids are the same.

POST http://localhost:9200/_bulk HTTP/1.1
Content-Type: application/json
Host: localhost:9200
Content-Length: 136
Expect: 100-continue

{"index":{"_index":"leagues","_type":"team","_id":"2","_parent":1,"_routing":1}}
{"id":2,"name":"Young Boys","stadium":"Wankdorf Bern"}

Step 5: Add a Player document

A player can then be added to the index with a team parent and a routing to the leagueCup top level parent.

private static void AddPlayerToTeam(long teamId, long leagueId)
{
	var yvonMvogo = new Player { Id = 3, Name = "Yvon Mvogo", Age = 20, Goals = 0, Assists = 0, Position = "Goalkeeper" };

	using (var context = new ElasticsearchContext(ConnectionString, Config))
	{
		context.TraceProvider = new ConsoleTraceProvider();
		context.AddUpdateDocument(yvonMvogo, yvonMvogo.Id, new RoutingDefinition { ParentId = teamId, RoutingId = leagueId });
		context.SaveChanges();
	}
}

The PUT request is again sent in a bulk request. This of course could be sent with the previous request, but for demo purposes is sent alone.

POST http://localhost:9200/_bulk HTTP/1.1
Content-Type: application/json
Host: localhost:9200
Content-Length: 167
Expect: 100-continue

{"index":{"_index":"leagues","_type":"player","_id":"3","_parent":2,"_routing":1}}
{"id":3,"name":"Yvon Mvogo","goals":0,"assists":0,"position":"Goalkeeper","age":20}

Now that 3 documents exist in the index, the documents can be selected from the search engine. The GET request for a player document requires both the parent Id and also the routing Id.

private static Player GetPlayer(long playerId, long leagueId, long teamId)
{
	Player player;
	using (var context = new ElasticsearchContext(ConnectionString, Config))
	{
		context.TraceProvider = new ConsoleTraceProvider();
		player = context.GetDocument<Player>(playerId, new RoutingDefinition { ParentId = teamId, RoutingId = leagueId });
	}

	return player;
}

The GetPlayer request is sent as follows:

GET http://localhost:9200/leagues/player/3?parent=2&routing=1 HTTP/1.1
Host: localhost:9200

Response:

HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 167

{
 "_index":"leagues",
 "_type":"player",
 "_id":"3","_version":1,
 "found":true,"
 _source": { 
    "id":3,
    "name":
    "Yvon Mvogo",
    "goals":0,
    "assists":0,
    "position":"Goalkeeper",
    "age":20
  }
}

Conclusion:

It is very simple to define and use child documents and grandchild documents in Elasticsearch. If you want to optimize the search performance, you need to save the documents to the same shard. This is achieved using routing. If only parent and child documents are used, only the parent Id is required. If all tree structures are updated and added at the same time, maybe nested documents should be used. All data structures have advantages and disadvantages. The correct one should be chosen according to your requirements.

Links:

https://www.nuget.org/packages/ElasticsearchCRUD/

http://www.elasticsearch.org/

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: