search - Lucene not finding expected data -
i'm having issue lucene , i'm hoping can give me idea i'm doing wrong.
i'm using lucene 4.4 , i'm using standardanalyser. i'm trying search on 1 field i'm getting weird result.
for example when search word "gros*" result return records "grossesse". fine , expected. when search "gross*" finds nothing.
any idea i'm doing wrong? there setting i'm missing? or ideas appreciated.
thanks
this index
private void createindex(analyzer analyzer, string catalogueid, locale locale, directory index) throws ioexception { indexwriterconfig config = new indexwriterconfig(version.lucene_44, analyzer); indexwriter w = new indexwriter(index, config); document doc = null; (produitcatalogue produitcatalogue : produitcataloguesmap.get(catalogueid + locale.getlanguage()).values()) { doc = new document(); doc.add(new intfield("id", produitcatalogue.getid(), store.yes)); textfield desc = new textfield("description", produitcatalogue.getdescription(), store.yes); doc.add(desc); w.adddocument(doc); } w.close(); } }
this createquery
private query createquery(string searchtxt, analyzer analyzer) throws parseexception { queryparser queryparser = new queryparser(version.lucene_44, "description", analyzer); queryparser.setallowleadingwildcard(true); queryparser.setautogeneratephrasequeries(false); query q = queryparser.parse(searchtxt); return q; }
this analyzer
analyzer analyzer = englishanalyzer; if (locale.canada_french.getlanguage().equals(locale.getlanguage())) { analyzer = frenchanalyzer; } query q = createquery(searchtxt, analyzer); directoryreader reader = directoryreader.open(indexmap.get(catalogueid + locale.getlanguage())); indexsearcher searcher = new indexsearcher(reader); topscoredoccollector collector = topscoredoccollector.create(hits_per_page, true); searcher.search(q, collector); scoredoc[] hits = collector.topdocs().scoredocs;
prefix queries (as wildcard, fuzzy , regex queries) not passed through analyzer. since using language-specific analyzers (englishanalyzer
, frenchanalyzer
), indexed data passed through analyzer stemmed. guess, after stemming, "grossesse" indexed stem "gros". searching "gross" without wildcards would, presume, hit (i haven't gone on pertinent stemming logic absolute certainty though).
one possible way allow both stemming , wildcard querying, index data in 2 fields, 1 stemmed using language analyzers, other unstemmed, using standardanalyzer
. either, search both, or search selectively based on in query. user entered queries particularly, searching both fields simultaneously right approach mind.
Comments
Post a Comment