Computer Vision News - August 2021
“We want to be able to detect the digits and four-digit year strings to support document indexing,” Amir tells us. “Historical handwritten documents like these can be complex, particularly in terms of their writing style and appearance, so reading them with the computer is very difficult.” “Some priests use Gothic-style writing, which is more elaborate,” Hüseyin adds. “There is a world of difference between the way a zero is written, for example, which is generally easy to recognize, and the number two, which can be much more complicated.” To recognize these digits, they used different models and different available handwritten digit datasets, including MNIST and USPS . However, they found that the characteristics of these relatively modern datasets limited their application for historical texts. “ One of the issues with MNIST is that it has a fixed size of 28 x 28 pixels, ” Amir explains. “ By resizing all these historical digits to 28 x 28 you decrease the quality. Many of them have already lost their colour over the years and degraded in quality; by downsampling them, we lose all their important features .” 29 DIDA and DIGITNET
Made with FlippingBook