Java Answers

Question Asked By: Fahimah Khan on Jun 20 In Java Category.

Using lucene in Farsi

Question Answered By: Cadencia Bernard on Jun 20

As you know most of people use lucene for searching through html pages, or documents, so the "content" is important. But the standard analyzer can not do stemming or lemmatization for Persian words.
As Arash mentioned, I have implemented that lucene project for IR course at Sharif Univ. for one of the students there.
My implementation just removes common words (like "Beh", "Taa", "Va", ...), but can not reduce the words to the root form (e.g. removing the "Haa" for plural form of words, ...).
As I remember none of the students has implemented better stemming for that project. Decent stemming is not easy and you may need a customized Persian dictionary.
However indexing without this stemming may be sufficient for someone's need. In this way for example if you want to search the Persian word "Ketaab", the documents that contain "Ketaabhaa" will not be in the result of your search.

This Question has 12 more answer(s). View Complete Question Thread

Didn't find what you were looking for? Find more on Using lucene in Farsi Or get search suggestion and latest updates.

Tagged:using lucene in farsi

RSS Feeds:	Articles \| Forum \| New Users \| Activities \| Interview FAQ \| Poll \| Hotlinks
Social Networking:	Hall of Fame \| Facebook \| Twitter \| LinkedIn
Terms:	Terms of Use \| Privacy Policy \| Contact us