Logo 
Search:

Java Answers

Ask Question   UnAnswered
Home » Forum » Java       RSS Feeds
  Question Asked By: Tracey King   on Jan 17 In Java Category.

  
Question Answered By: Hoor Khan   on Jan 17

There are two types of PDFs: application-generated and scanned PDFs
OCR software are for the second one (recognition of the text  contained in an image  and converts it to text).
But lots of pdf  files (at least academic ones) are just application generated, which is much easier to convert  to text compared to the scanned image.
But as for Persian I found no accurate software to do this. Acrobat Reader Professional, itself, has save as (doc, rtf, txt) feature, but it really does not work accurately for Persian language, even if you choose "select text" and copy the text, when you paste it in notepad it will ruin the characters.
Arabic characters are similar to Persian, and I guess if they knew how to extract Arabic, they would add Persian support.

www.proz.com/translation-articles/articles/128/

I have heard that OCR software for Arabic and Persian are expensive and inaccurate.

Share: 

 

This Question has 4 more answer(s). View Complete Question Thread

 
Didn't find what you were looking for? Find more on Internationalization in Java Or get search suggestion and latest updates.


Tagged: