OCR in pdf reader(opensuse-okular)

oldman · December 13, 2016, 11:51pm

Hi,

Is there a pdf reader on linux that can do OCR(Optical Character Recognition).

I currently use Okular on opensuse leap, but when the pdf is only scanned I cannot copy the text. The only option I get is to copy the picture of the tex.

kind regards,
oldman

ps. I am not sure if this post should go into Linux-other or into the linux-helpdesk

Baz · December 14, 2016, 12:56am

Never played around with this but learn everyday I guess...
Okular seems to work fine albeit being a bit awkward as in random spaces and whatnot on the output, or is the issue in scanning a physical document?
Just painting some text then pasting seems to work ok.

anon5644329 · December 14, 2016, 1:05am

You can try opening the file with Libreoffice writer it will OCR it and you can copy the text but only line by line.

oldman · December 14, 2016, 11:34am

This does however not work if the document is a scanned document. For this I would need OCR, but thanks for the reply.

oldman · December 14, 2016, 11:36am

Thanks for the reply but this is not an option, since I really do not want to open a file in libreoffice every time I have to copy something from it

anon5644329 · December 14, 2016, 12:15pm

Well its up to you. You can also check "tesseract" an open source google project for OCR but it is only back end you can use it through CLI. For a frontend app you will have to search around.

saona-raimundo · January 7, 2022, 2:15pm

And, in my opinion, “normcap” is a handy 3rd party integration.