Joaquim Rocha
Published on

OCRFeeder 0.7.7 released

  • avatar
    Joaquim Rocha
  • Principal Software Engineering Manager at Microsoft

After more than 4 months, I am finally releasing OCRFeeder‘s new version (its last release was in August, just before the DesktopSummit). The reason for the delay, apart from some vacation in Berlin and Portugal and being busy in Igalia, was that this release brings deep changes internally.

The big issue

The problem with developing such an application from scratch in just a few months and worrying about writing a thesis is that you don’t care much for design and performance. So from 2008 until now, OCRFeeder has suffered a big problem related to memory consumption: depending on the number of images loaded and their size, it would create a reviewer (this is what I call the place where you do stuff on the images) per image and those would remain in memory, eventually crashing. I assumed that since nobody complained about that for so long it was probably because people made a simpler usage of the application and didn’t use it for full books but now it seems that some institutions are interested in OCRFeeder and there have a been complaints and bugs filed (gb#637599 and db#646605).

This was fixed by having only up to 5 instances of reviewers. When selecting a new image, it will drop the oldest reviewer and have this one added to the cache. It gets a bit slower to select a new image but the trade-off is worth IMHO. In future changes I’ll probably make the number of reviewers configurable in some way. Each of the content areas now also shares an editor instance instead of each one having a dedicated one.

I was able to load more than 500 images of ~4.5 Mb each and it was still usable so hopefully this will improve the experience for users who had these problems.

Other changes

Another change is that now OCRFeeder stores all its temporary files in a dedicated temporary folder under the system’s temporary folder (usually /tmp). By deleting this folder when the application quits it’s guaranteed that no temporary files will be left (as happened sometimes). Related to these changes, I’ve also decided to remove the possibility of choosing the temporary folder. Supposedly Python will already know what’s the system’s temporary folder and having such an option would make it look like Windows software from 1998.

As usual, some code cleaning and bug fixing was done and I would like to thank the awesome GNOME i18n team and everyone who sent their contributions. Thanks to my friend Berto you can also expect an OCRFeeder Debian package on a repository next to you soon.

For a more detailed list of changes, check out the NEWS file.

Source Tarball Git Bugzilla