search - how to configure the synonyms_path in elasticsearch -
i'm pretty new elasticsearch , want use synonyms, added these lines in configuration file:
index :     analysis :         analyzer :              synonym :                 type : custom                 tokenizer : whitespace                 filter : [synonym]         filter :             synonym :                 type : synonym                 synonyms_path: synonyms.txt then created index test:
"mappings" : {   "test" : {      "properties" : {         "text_1" : {            "type" : "string",            "analyzer" : "synonym"         },         "text_2" : {            "search_analyzer" : "standard",            "index_analyzer" : "synonym",            "type" : "string"         },         "text_3" : {            "type" : "string",            "analyzer" : "synonym"         }      }   } }
and insrted type test data:
{ "text_3" : "foo dog cat", "text_2" : "foo dog cat", "text_1" : "foo dog cat" } synonyms.txt contains "foo,bar,baz", , when search foo returns expected when search baz or bar return 0 results:
{ "query":{ "query_string":{     "query" : "bar",     "fields" : [ "text_1"],     "use_dis_max" : true,     "boost" : 1.0 }}}  result:
{ "took":1, "timed_out":false, "_shards":{ "total":5, "successful":5, "failed":0 }, "hits":{ "total":0, "max_score":null, "hits":[ ] } } 
i don't know, if problem because defined bad synonyms "bar". said pretty new i'm going put example similar yours works. want show how elasticsearch deal synonyms @ search time , @ index time. hope helps.
first thing create synonym file:
foo => foo bar, baz now create index particular settings trying test:
curl -xput 'http://localhost:9200/test/' -d '{   "settings": {     "index": {       "analysis": {         "analyzer": {           "synonym": {             "tokenizer": "whitespace",             "filter": ["synonym"]           }         },         "filter" : {           "synonym" : {               "type" : "synonym",               "synonyms_path" : "synonyms.txt"           }         }       }     }   },   "mappings": {      "test" : {       "properties" : {         "text_1" : {            "type" : "string",            "analyzer" : "synonym"         },         "text_2" : {            "search_analyzer" : "standard",            "index_analyzer" : "standard",            "type" : "string"         },         "text_3" : {            "type" : "string",            "search_analyzer" : "synonym",            "index_analyzer" : "standard"         }       }     }   } }' note synonyms.txt must in same directory configuration file since path relative config dir.
now index doc:
curl -xput 'http://localhost:9200/test/test/1' -d '{   "text_3": "baz dog cat",   "text_2": "foo dog cat",   "text_1": "foo dog cat" }' now searches
searching in field text_1
curl -xget 'http://localhost:9200/test/_search?q=text_1:baz' {   "took": 3,   "timed_out": false,   "_shards": {     "total": 5,     "successful": 5,     "failed": 0   },   "hits": {     "total": 1,     "max_score": 0.15342641,     "hits": [       {         "_index": "test",         "_type": "test",         "_id": "1",         "_score": 0.15342641,         "_source": {           "text_3": "baz dog cat",           "text_2": "foo dog cat",           "text_1": "foo dog cat"         }       }     ]   } } you document because baz synonym of foo , @ index time foo expanded synonyms
searching in field text_2
curl -xget 'http://localhost:9200/test/_search?q=text_2:baz' result:
{   "took": 2,   "timed_out": false,   "_shards": {     "total": 5,     "successful": 5,     "failed": 0   },   "hits": {     "total": 0,     "max_score": null,     "hits": []   } } i don't hits because didn't expand synonyms while indexing (standard analyzer). and, since i'm searching baz , baz not in text, don't result.
searching in field text_3
curl -xget 'http://localhost:9200/test/_search?q=text_3:foo' {   "took": 3,   "timed_out": false,   "_shards": {     "total": 5,     "successful": 5,     "failed": 0   },   "hits": {     "total": 1,     "max_score": 0.15342641,     "hits": [       {         "_index": "test",         "_type": "test",         "_id": "1",         "_score": 0.15342641,         "_source": {           "text_3": "baz dog cat",           "text_2": "foo dog cat",           "text_1": "foo dog cat"         }       }     ]   } } note: text_3 "baz dog cat"
text_3 indexes without expanding synonyms. i'm searching foo, have "baz" 1 of synonyms result.
if want debug can use _analyze endpoint example:
curl -xget 'http://localhost:9200/test/_analyze?text=foo&analyzer=synonym&pretty=true' result:
{   "tokens": [     {       "token": "foo",       "start_offset": 0,       "end_offset": 3,       "type": "synonym",       "position": 1     },     {       "token": "baz",       "start_offset": 0,       "end_offset": 3,       "type": "synonym",       "position": 1     },     {       "token": "bar",       "start_offset": 0,       "end_offset": 3,       "type": "synonym",       "position": 2     }   ] } 
Comments
Post a Comment