Using Elasticsearch German Analyzer

The article explains how to use Elasticsearch’s default German analyzer. An index using ElasticsearchCRUD, is created which maps a field using the german analyzer for both search and also indexing.

Code: https://github.com/damienbod/ElasticsearchGermanAnalyzer

Other Tutorials:

Part 1: ElasticsearchCRUD introduction
Part 2: MVC application search with simple documents using autocomplete, jQuery and jTable
Part 3: MVC Elasticsearch CRUD with nested documents
Part 4: Data Transfer from MS SQL Server using Entity Framework to Elasticsearch
Part 5: MVC Elasticsearch with child, parent documents
Part 6: MVC application with Entity Framework and Elasticsearch
Part 7: Live Reindex in Elasticsearch
Part 8: CSV export using Elasticsearch and Web API
Part 9: Elasticsearch Parent, Child, Grandchild Documents and Routing
Part 10: Elasticsearch Type mappings with ElasticsearchCRUD
Part 11: Elasticsearch Synonym Analyzer using ElasticsearchCRUD
Part 12: Using Elasticsearch German Analyzer
Part 13: MVC google maps search using Elasticsearch
Part 14: Search Queries and Filters with ElasticsearchCRUD
Part 15: Elasticsearch Bulk Insert
Part 16: Elasticsearch Aggregations With ElasticsearchCRUD
Part 17: Searching Multiple Indices and Types in Elasticsearch
Part 18: MVC searching with Elasticsearch Highlighting
Part 19: Index Warmers with ElasticsearchCRUD

The German analyzer can be used by defining the Analyzer property in the ElasticsearchString attribute. This property adds the analyzer for both search and indexing. The property can use any string, so custom analyzers can also be defined. The Fields property is also set. This is used so that the original string can also be used for search. The Fields property uses a Type which is the class where the child mappings for this field can be defined.

public class GermanData
{
	public long Id { get; set; }

	public string Name { get; set; }

	public string FamilyName { get; set; }

	[ElasticsearchString(Fields = typeof(FieldDataDefinition), Analyzer=LanguageAnalyzers.German)]
	public string Info { get; set; }
}

public class FieldDataDefinition
{
	[ElasticsearchString(Index=StringIndex.not_analyzed)]
	public string Raw { get; set; }	
}

The index is then created using the mapping:

_context.IndexCreate<GermanData>(indexDefinition);

Now some data can be added to the index.

public void CreateSomeMembers()
{
	var jm = new GermanData {Id = 1, FamilyName = "Moore", Info = "Muenich", Name = "John"};
	_context.AddUpdateDocument(jm, jm.Id);
	var jj = new GermanData { Id = 2, FamilyName = "Jones", Info = "Münich", Name = "Johny" };
	_context.AddUpdateDocument(jj, jj.Id);
	var pm = new GermanData { Id = 3, FamilyName = "Murphy", Info = "Munich", Name = "Paul" };
	_context.AddUpdateDocument(pm, pm.Id);
	var sm = new GermanData { Id = 4, FamilyName = "McGurk", Info = "munich", Name = "Séan" };
	_context.AddUpdateDocument(sm, sm.Id);
	var sob = new GermanData { Id = 5, FamilyName = "O'Brien", Info = "Not a much use, bit of a problem", Name = "Sean" };
	_context.AddUpdateDocument(sob, sob.Id);
	var tmc = new GermanData { Id = 6, FamilyName = "McCauley", Info = "Couldn't a ask for anyone better", Name = "Tadhg" };
	_context.AddUpdateDocument(tmc, tmc.Id);

	_context.SaveChanges();
}

If a query search is sent to this index and type, the tokens from the german analyzer are used and all the different Munich types are found. Munich, Münich, Muenich and munich were indexed and these where saved as munich tokens.

This can be checked as follows:

http://localhost:9200/germandatas/_analyze?&analyzer=german&text=Muenich munich Münich Munich

The query search is sent as follows:

POST http://localhost:9200/germandatas/germandata/_search HTTP/1.1
Content-Type: application/json
Host: localhost:9200
Content-Length: 46
Expect: 100-continue

{ "query": { "match": {"info": "Muenich"} }  }

The search returns 4 different results, which matches the data we added. This is what we expect.

HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 703

{
   "took":1,"timed_out":false,"_shards":{
      "total":5,"successful":5,"failed":0},"hits":{
        "total":4,"max_score":1.0,"hits":[
              {"_index":"germandatas","_type":"germandata","_id":"1","_score":1.0,"_source":{"id":1,"name":"John","familyname":"Moore","info":"Muenich"}},
              {"_index":"germandatas","_type":"germandata","_id":"4","_score":0.30685282,"_source":{"id":4,"name":"Séan","familyname":"McGurk","info":"munich"}},
              {"_index":"germandatas","_type":"germandata","_id":"2","_score":0.30685282,"_source":{"id":2,"name":"Johny","familyname":"Jones","info":"Münich"}},
              {"_index":"germandatas","_type":"germandata","_id":"3","_score":0.30685282,"_source":{"id":3,"name":"Paul","familyname":"Murphy","info":"Munich"}}
         ]
        }
}

It is very easy to use the built-in language analyzers in Elasticsearch using ElasticsearchCRUD. Different blogs exist explaining how to use different ‘German’ analyzers with different configurations. These can also be configured in ElasticsearchCRUD as a custom analyzer. I have no systematic comparisons of all the different analyzers to say which one is better to use for the different types of data.

Links:

http://gibrown.com/2013/05/01/three-principles-for-multilingal-indexing-in-elasticsearch/

http://thediscoblog.com/blog/2013/09/14/understanding-elasticsearch-analyzers/

http://jprante.github.io/lessons/2012/05/16/multilingual-analysis-for-title-search.html

https://www.found.no/foundation/text-analysis-part-1/

http://dev.mikamai.com/post/104070888314/beginning-elasticsearch-mappings-and-analyzers

http://simpsora.wordpress.com/2014/05/02/customizing-elasticsearch-english-analyzer/

http://dev-blog.xoom.com/2013/12/01/natural-language-and-targeted-search-using-elastic-search/

http://obtao.com/blog/2013/10/configure-elasticsearch-on-an-efficient-way/

http://richardmiller.co.uk/2011/11/23/symfony2-elasticsearch-analyzers/

http://cdn.oreillystatic.com/en/assets/1/event/115/An%20Elasticsearch%20Crash%20Course%20Presentation.pdf

https://github.com/jprante/elasticsearch-analysis-decompound

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: