ProLector
ProLector provides a variety of features designed to read even poor-quality documents (e.g. automatic splitting of ligatures and automatic joining of broken letters).
See a page of Gaelic language from ‘Cúl le muir agus Scéalta eile’’ as part of the Foclóir na Nua-Ghaeilge, Corpas na Gaeilge project. The page is digitised and a high resolution (1200 dpi (dots per inch)) uncompressed TIFF output is created. The image is then cleaned, cropped to enable processing into our software engine ProLector.
This screen shot shows the image, fontbase pattern system at 1,009 and text output with 7 pages read containing 11,379 chars A fontbase is created using the Centre’s Unicode Character Index, capturing patterns to enable conversion into the required symbols. The fontbase is continually re-assessed to ensure no error has crept in. |
|
This screen shot shows ProLector in Training mode, the analyst entering patterns into the fontbase. This pattern is a Gaelic g. As you can see ProLector has many functions, the analyst can set error rates change font styles, remove dirt etc. |
|
This screen shot shows ProLector in Interactive (manual) mode, the analyst capturing patters not picked up by the designed fontbase. This pattern is a Gaelic Bh. |
|
Another screen shot shows ProLector in manual mode, the analyst capturing patterns not picked up by the designed fontbase. This pattern is a Unicode symbol á Aacute 00C1 |
|
This screen shot shows raw capture in text format. The centre has created a unique way of capturing Unicode systems see &aa. which in turn is the Unicode system á Aacute 00C1 |
|
This screen shows the full capture of multiple pages in raw text format. |
|
This screen shot shows a glimpse of the VBA (Visual Basic for Applications). The Centre creates a bespoke macro to convert the raw output into a Word file, which gives a representation of the printed page. |
|
This screen shot shows the converted text, with a colour coded highlighting system to assist the analyst with quality control. |
|
This screen shows how the analyst post-processes both image and output. Correcting when necessary. Final formats will be delivered to the agreed service level specifications. This particular project is with the Royal Irish Academy, click here for more details |