The 0.7.1a version of OCRFeeder has been released.
This version introduces some tasks performed by Emergya as part of the GuadaLinfo Accessible project, such as:
* Importation from a scanner device.
* Copying text from the content boxes to the clipboard.
* Users can now use the typical spell-checker dialog to correct mistakes in the text recognized by the OCR engines.
Other highlights include:
* Rewritten ocrfeeder-cli (which also introduces a help method now)
* Added the automatic detection of the Cuneiform OCR engine
* Move the OCRFeeder modules to its own folder (so it is better organized and doesn’t conflict with other modules when installing it)
And some bug fixing:
* Add the help option to ocrfeeder-cli (gb#630829)
* Fix selecting all areas
* Fix ellipsis and title in the queued events dialog
* Prevent “invisible” boxes creation
* Remove temporary images for the Tesseract OCR engine
A big thanks to the great GNOME translators for keeping OCRFeeder available in a number of languages and to Berto for making it available in Debian (which later got into Ubuntu as well).
Just as I was releasing the 0.7.1a version I realized the spell-checker.ui file was not being installed so I quickly did a tiny release, hence the 0.7.1a and not simply 0.7.1.
Letters in version is not generally a good idea. For a brown-paper-bag release I’d advise adding a .1 nano
I think 0.7.1a will be confused in automated tools as being an earlier release than 0.7.1. A better version number would be something like 0.7.2 or 0.7.1.1.
For instance, I’m using Mozilla Firefox 4.0b8 as my primary web browser, which is the 8th beta of what will eventually be released as Firefox 4.0, which will later receive an update as 4.0.1 or whatever.
I use Ubuntu 10.10 32-bit. I additional installed pyEnchant and python-imaging-sane packages for successfully running OCRFeeder 0.7.1a
Hi guys, yeah, probably I should not have used a letter but this is a source tarball only; people who want to create packages might be able to sort it out depending on what they need.
This will give me extra motivation to push for the 0.7.2 version.
Thanks,
“””I realized the spell-checker.ui file was not being installed”””
Sorry but you don’t realized, why you have to lie?
https://bugzilla.gnome.org/show_bug.cgi?id=633450
Hi True,
Actually I DID realize. I’m not talking about the bug you mentioned. That bug was not about the spell-checker.ui “not being installed” but rather not being added to git.
Also, instead of trying to attack me and call me a lier, you could at least check git’s log and see that I closed that bug when I committed the missing file, both on November 5th, before the release:
http://git.gnome.org/browse/ocrfeeder/commit/?id=9842fd204566e57f4593648c43cbad8fd679be34
and that I realized the file was not being *installed* on November 9th, after the release:
http://git.gnome.org/browse/ocrfeeder/commit/?id=30f1e24ce4816f27469cd82fd138fe67ae5dd325
One last thing, please consider putting your real name and, optionally, your email because going anonymous for stating such a wrong accusation would be worth if I couldn’t figure out who you are from 1) the bug you mention, 2) your company that I could easily get from your IP address…
AHAH! OWNED!
I’m very happy to see that someone is working on GUI OCR tools!
That’s a lot of scrollbars in that screenshot. What happened there?
Hi Marius,
The scrollbars are needed so far. I have been focusing more on features and bug fixing and less on the UI. But still I don’t think it is difficult to use. Also notice the window was not maximized (to get better for the screenshot).
Current version of OCRFeeder (0.7.1) does not allow to choose a language to recognize. I needed to add some options to the command line in the ocr engines settings window to recognize non-english text.
It would be great to have some GUI to select the main recognition language. It would be even greater to set a language for every area (most OCR engines like tesseract and cuneiform currently do not allow to recognize multilanguage documents).
And thank you for all your efforts 🙂
All recognized text is marked as “English” so the spelling system mark all words as unknown.
Hi Gregg,
I try to make it easy to use OCR engines but without losing the flexibility of using them as if it was from the command line. Still, maybe I could create a more high-level way of configuring the most known OCR engines such as Tesseract and Cuneiform. Let’s see if I can do it in the future.
Thank you for your answer.
I’ve found some issues with PDF importing. It works only for very small files. Just try to import smth like this: http://www.imwerden.info/pdf/lomonosov_polnoe_sobranie_sochineny_tom4_1803.pdf to understand my problem 🙂
Hi Gregg,
The PDF importation takes a lot of time because of the tool it uses to convert the PDF to images (GhostScript). Is time the problem you’re talking about?
I’ll try importing the PDF you mention when I have time.