Abstract:
Saraiki language is one of the local languages of Pakistan. It is spoken and understood over a large geographical part of Pakistan. Little work has been done to develop Optical Character Recognition systems for local languages due to the complex writing system. The OCR system for Saraiki language can help to digitize the language literature. This work presents an OCR system that uses the Neural Network to recognize the printed text images of Saraiki (Urdu/Arabic/Punjabi) language generated in MS Word. Neural Network is trained with the segmented and isolated character set. At first, characters are extracted from the text image using segmentation approach. These segmented characters are then fed to the Neural Network in order to be recognized. MATLAB is used for the implementation of the OCR system that at present shows about 85% accuracy.