search - how to configure the synonyms_path in elasticsearch -
i'm pretty new elasticsearch , want use synonyms, added these lines in configuration file:
index : analysis : analyzer : synonym : type : custom tokenizer : whitespace filter : [synonym] filter : synonym : type : synonym synonyms_path: synonyms.txt
then created index test:
"mappings" : { "test" : { "properties" : { "text_1" : { "type" : "string", "analyzer" : "synonym" }, "text_2" : { "search_analyzer" : "standard", "index_analyzer" : "synonym", "type" : "string" }, "text_3" : { "type" : "string", "analyzer" : "synonym" } } }
}
and insrted type test data:
{ "text_3" : "foo dog cat", "text_2" : "foo dog cat", "text_1" : "foo dog cat" }
synonyms.txt contains "foo,bar,baz", , when search foo returns expected when search baz or bar return 0 results:
{ "query":{ "query_string":{ "query" : "bar", "fields" : [ "text_1"], "use_dis_max" : true, "boost" : 1.0 }}}
result:
{ "took":1, "timed_out":false, "_shards":{ "total":5, "successful":5, "failed":0 }, "hits":{ "total":0, "max_score":null, "hits":[ ] } }
i don't know, if problem because defined bad synonyms "bar". said pretty new i'm going put example similar yours works. want show how elasticsearch deal synonyms @ search time , @ index time. hope helps.
first thing create synonym file:
foo => foo bar, baz
now create index particular settings trying test:
curl -xput 'http://localhost:9200/test/' -d '{ "settings": { "index": { "analysis": { "analyzer": { "synonym": { "tokenizer": "whitespace", "filter": ["synonym"] } }, "filter" : { "synonym" : { "type" : "synonym", "synonyms_path" : "synonyms.txt" } } } } }, "mappings": { "test" : { "properties" : { "text_1" : { "type" : "string", "analyzer" : "synonym" }, "text_2" : { "search_analyzer" : "standard", "index_analyzer" : "standard", "type" : "string" }, "text_3" : { "type" : "string", "search_analyzer" : "synonym", "index_analyzer" : "standard" } } } } }'
note synonyms.txt must in same directory configuration file since path relative config dir.
now index doc:
curl -xput 'http://localhost:9200/test/test/1' -d '{ "text_3": "baz dog cat", "text_2": "foo dog cat", "text_1": "foo dog cat" }'
now searches
searching in field text_1
curl -xget 'http://localhost:9200/test/_search?q=text_1:baz' { "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.15342641, "hits": [ { "_index": "test", "_type": "test", "_id": "1", "_score": 0.15342641, "_source": { "text_3": "baz dog cat", "text_2": "foo dog cat", "text_1": "foo dog cat" } } ] } }
you document because baz synonym of foo , @ index time foo expanded synonyms
searching in field text_2
curl -xget 'http://localhost:9200/test/_search?q=text_2:baz'
result:
{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 0, "max_score": null, "hits": [] } }
i don't hits because didn't expand synonyms while indexing (standard analyzer). and, since i'm searching baz , baz not in text, don't result.
searching in field text_3
curl -xget 'http://localhost:9200/test/_search?q=text_3:foo' { "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.15342641, "hits": [ { "_index": "test", "_type": "test", "_id": "1", "_score": 0.15342641, "_source": { "text_3": "baz dog cat", "text_2": "foo dog cat", "text_1": "foo dog cat" } } ] } }
note: text_3 "baz dog cat"
text_3 indexes without expanding synonyms. i'm searching foo, have "baz" 1 of synonyms result.
if want debug can use _analyze
endpoint example:
curl -xget 'http://localhost:9200/test/_analyze?text=foo&analyzer=synonym&pretty=true'
result:
{ "tokens": [ { "token": "foo", "start_offset": 0, "end_offset": 3, "type": "synonym", "position": 1 }, { "token": "baz", "start_offset": 0, "end_offset": 3, "type": "synonym", "position": 1 }, { "token": "bar", "start_offset": 0, "end_offset": 3, "type": "synonym", "position": 2 } ] }
Comments
Post a Comment