Efficient Document Image Binarization using Heterogeneous Computing and Interactive Machine Learning
Large sets of historical document images have been amassed by companies and governmental entities over the years, and more recently, these collections have become accessible to the general public online. However, in order to truly benefit from these repositories, the text within these images must be made legible and searchable. One crucial step in this process is document image binarization, which involves separating the text foreground from the page background. This separation not only enhances the readability of text in document images for humans but also facilitates the processing of these images by various algorithms.
While several effective binarization algorithms are available, it is not just about achieving accurate foreground and background separation; efficiency in terms of execution time and machine learning training data is equally important. This efficiency is essential for making binarization not only theoretically feasible but also practically viable.
In this study, various methods to enhance the efficiency of binarization in terms of execution time are explored by refining the implementation and algorithm of a cutting-edge binarization technique. It was discovered that performance can be improved through parameter prediction and by mapping the algorithm onto a graphics processing unit (GPU). Additionally, a new binarization algorithm based on recurrent neural networks was proposed, and the impact of design parameters on execution time and binarization quality was evaluated. A trade-off between quality and performance was identified based on the algorithm’s footprint size, with dynamically weighted training loss proving beneficial for improving binarization quality.
Moreover, the issue of training data efficiency was addressed by investigating the use of interactive machine learning to reduce the amount of training data required for the recurrent neural network-based method. The study revealed that user feedback can lead to better binarization quality with less training data, and visualized uncertainty can assist users in providing more relevant feedback.
Efficient document image binarization is crucial for unlocking the full potential of historical document collections. By leveraging heterogeneous computing and interactive machine learning techniques, researchers are making significant strides towards making these invaluable resources more accessible and useful to a wider audience.