Elasticsearch Synonym Analyzer using ElasticsearchCRUD

This article shows how to setup and use a custom analyzer in Elasticsearch using ElasticsearchCRUD. An analyzer with a custom Synonym Token Filter is created and added to the index. If you search for any of the synonyms, you will find all the hits for all the possible texts.

Code: https://github.com/damienbod/ElasticsearchSynonymAnalyzer

Other Tutorials:

Part 1: ElasticsearchCRUD introduction
Part 2: MVC application search with simple documents using autocomplete, jQuery and jTable
Part 3: MVC Elasticsearch CRUD with nested documents
Part 4: Data Transfer from MS SQL Server using Entity Framework to Elasticsearch
Part 5: MVC Elasticsearch with child, parent documents
Part 6: MVC application with Entity Framework and Elasticsearch
Part 7: Live Reindex in Elasticsearch
Part 8: CSV export using Elasticsearch and Web API
Part 9: Elasticsearch Parent, Child, Grandchild Documents and Routing
Part 10: Elasticsearch Type mappings with ElasticsearchCRUD
Part 11: Elasticsearch Synonym Analyzer using ElasticsearchCRUD
Part 12: Using Elasticsearch German Analyzer
Part 13: MVC google maps search using Elasticsearch
Part 14: Search Queries and Filters with ElasticsearchCRUD
Part 15: Elasticsearch Bulk Insert
Part 16: Elasticsearch Aggregations With ElasticsearchCRUD
Part 17: Searching Multiple Indices and Types in Elasticsearch
Part 18: MVC searching with Elasticsearch Highlighting
Part 19: Index Warmers with ElasticsearchCRUD

Creating a custom Synonym Analyzer

Creating an index with custom analyzers, filters or tokenizers is very easy to do in ElasticsearchCRUD. Strongly typed class configuration is available for all types as well as const definitions for all the default possibilities. In the example below, a SynonymTokenFilter is created and added to the custom analyzer. This is then included in the index mappings along with other configurations.

new IndexDefinition
{
	IndexSettings =
	{
		Analysis = new Analysis
		{
			Analyzer =
			{
				Analyzers = new List<AnalyzerBase>
				{
					new CustomAnalyzer("john_analyzer")
					{
						Tokenizer = DefaultTokenizers.Whitespace,
						Filter = new List<string> {DefaultTokenFilters.Lowercase, "john_synonym"}
					}
				}
			},
			Filters =
			{
				CustomFilters = new List<AnalysisFilterBase>
				{
					new SynonymTokenFilter("john_synonym")
					{
						Synonyms = new List<string>
						{
							"sean  => john, sean, séan",
							"séan => john, sean, séan",
							"johny => john",
						}
					}
				}
			}
		},
		NumberOfShards = 3,
		NumberOfReplicas = 1
	},
};

The index settings is created as follows in Elasticsearch:

http://localhost:9200/_settings

analyzersEs_01

Creating the mapping for anaylzed and non-analyzed

Now a mapping is created for the member type using the Member class to map this. The Name property uses the custom analyzer and also saves the original text in a non-analyzed field using the Fields configuration.

public class Member
{
	public long Id { get; set; }

	[ElasticsearchString(Index = StringIndex.analyzed, Analyzer="john_analyzer", Fields = typeof(FieldDataDefinition))]
	public string Name { get; set; }

	public string FamilyName { get; set; }

	public string Info { get; set; }
}


public class FieldDataDefinition
{
	[ElasticsearchString(Index=StringIndex.not_analyzed)]
	public string Raw { get; set; }
}

The index with mappings and settings is created using the following method:

_context.IndexCreate<Member>(indexDefinition);

This is then created as follows:

http://localhost:9200/_mapping

analyzersEs_02

Adding some data

Now some data can be added to the index using a bulk insert.

public void CreateSomeMembers()
{
	var jm = new Member {Id = 1, FamilyName = "Moore", Info = "In the club since 1976", Name = "John"};
	_context.AddUpdateDocument(jm, jm.Id);
	var jj = new Member { Id = 2, FamilyName = "Jones", Info = "A great help for the background staff", Name = "Johny" };
	_context.AddUpdateDocument(jj, jj.Id);
	var pm = new Member { Id = 3, FamilyName = "Murphy", Info = "Likes to take control", Name = "Paul" };
	_context.AddUpdateDocument(pm, pm.Id);
	var sm = new Member { Id = 4, FamilyName = "McGurk", Info = "Fresh and fit", Name = "Séan" };
	_context.AddUpdateDocument(sm, sm.Id);
	var sob = new Member { Id = 5, FamilyName = "O'Brien", Info = "Not much use, bit of a problem", Name = "Sean" };
	_context.AddUpdateDocument(sob, sob.Id);
	var tmc = new Member { Id = 5, FamilyName = "McCauley", Info = "Couldn't ask for anyone better", Name = "Tadhg" };
	_context.AddUpdateDocument(tmc, tmc.Id);

	_context.SaveChanges();
}

Search using the analyzed data

The data can be searched for using the query match as we want to search the analyzed fields. If you search for sean, séan, Sean, Séan, John, Johny, all john results will be found.

//{
//  "query": {
//		"match": {"name": "sean"}
//	 }
//  }
//}
public SearchResult<Member> Search(string name)
{
	var query = "{ \"query\": { \"match\": {\"name\": \""+ name + "\"} }  } }";
	return _context.Search<Member>(query).PayloadResult;
}

The request is sent as follows:

POST http://localhost:9200/members/member/_search HTTP/1.1

Content-Type: application/json
Host: localhost:9200
Content-Length: 46
Expect: 100-continue
Connection: Keep-Alive

{ "query": { "match": {"name": "Johny"} }  } }

You can also inspect the analyzed tokens using the _analyze API

http://localhost:9200/members/_analyze?&analyzer=john_analyzer

Links:

https://www.nuget.org/packages/ElasticsearchCRUD/

http://www.elasticsearch.org/

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/using-synonyms.html

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/multi-word-synonyms.html

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/synonyms.html

http://stackoverflow.com/questions/18405779/how-to-configure-the-synonyms-path-in-elasticsearch

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-analyze.html

https://fuzzyquery.wordpress.com/2013/08/15/example-shingle-filter-mapping/

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/using-stopwords.html

https://blog.codecentric.de/en/2014/05/elasticsearch-indexing-performance-cheatsheet/

http://bundlr.com/clips/51f89017841e986de80003a7

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: