dc.description.abstract |
Machine simulation of human reading has been a subject of intensive research for almost four decades. The latest improvements in recognition methods and systems for Latin script are very promising and matured product are available for those languages in the market. On the contrary, despite more than one decade of research in the field of Urdu Optical Character Recognition (OCR), the reading skill of the computer is still way behind that of human. Automatic Urdu character recognition is a challenging task due to less attention of researchers and intrinsic complexity of Urdu text. That is highly cursive and calligraphic nature, diagonality in writing, and vertical overlap between characters in a sub-word. In this research, we present a novel implicit segmentation based technique for development of an OCR for printed Nasta'liq text lines. This work introduces a novel and robust approach based on statistical models that provide solution for recognition of Nasta’liq style Urdu text. Unlike to classical approaches which segment text into words, ligatures or characters, we employ an implicit segmentation where text lines are recognized during segmentation. The developed system is evaluated on standard Urdu text databases and compared with the state-of-the-art recognition techniques proposed till date. In the proposed recognition system, we use two strategies, first is based on manual features and second on automatic features. In the first strategy, we split each text line image into small frames of width ‘n’ by using a sliding window and extract many features from each frame. These features are then concatenated to form a feature vector for the text line. In the second strategy, we extract features automatically by using the Multi-dimensional (MD) Long Short Term Memory (LSTM) model in one scenario and by Convolutional Neural Network (CNN) model in other scenario. Features extracted from the text lines along with their respective transcriptions are fed to a Recurrent Neural Network (RNN) for training or classification. Recognition is obtained by using MDLSTM based recognizers with the Connectionist Temporal Classification (CTC) output layer. The experiments conducted on a standard UPTI database yield promising results. We obtained 96.40% (3.6% error rate) recognition rates using manual features, 98% (2.0% error rate) using raw pixels based features and 98.12% (1.88% error rate) using CNN based features. |
en_US |