Java Forum

Ask Question   UnAnswered
Home » Forum » Java       RSS Feeds

problem in lucene

  Asked By: Nicole    Date: Mar 17    Category: Java    Views: 527

i am trying to implement a search engine for me using lucene. but i am
stuck with a problem. i want to search a term with its position from a
text file. how to do this??? can anybody help me... i'll be thankful
to him.



1 Answer Found

Answer #1    Answered By: Heidi Larson     Answered On: Mar 17

The .prx file  contains the lists of positions that each term occurs at within documents.

ProxFile (.prx) --> <TermPositions>TermCount

TermPositions --> <Positions>DocFreq

Positions --> <PositionDelta>Freq

PositionDelta --> VInt

TermPositions are ordered by term (the term is implicit, from the .tis file).

Positions entries are ordered by increasing document number (the document number is implicit from the .frq file).

PositionDelta is the difference between the position  of the current occurrence in the document and the previous occurrence (or zero, if this is the first occurrence in this document).

For example, the TermPositions for a term which occurs as the fourth term in one document, and as the fifth and ninth term in a subsequent document, would be the following sequence of VInts:

4, 5, 4

If you want to implement  a search  engine, I propose you to use nutch instead of lucene. It simulates google using lucene. If your purpose is text summarization, MEAD is best solution.

Didn't find what you were looking for? Find more on problem in lucene Or get search suggestion and latest updates.