Abstract:
Computer, the humongous giant of technology, has brought innovative changes in every
aspect of life, especially in applications imitating humans. Currently, it is used in every
field of life to facilitate human endeavor. One such application is character recognition.
Character recognition is an important offshoot of pattern recognition problems. It imitates
a human’s ability to read, using a machine. It has been a field of intensive, if exotic,
research since the early days of the computer. This task becomes more complex and
demanding in case of handwritten and cursive text. Arabic script-based languages, which
are used by almost a quarter of the world’s population [Belaid et. al, 2010], are cursive,
rich in diacritical marks and variety of writing styles present a challenging task for the
researchers. Urdu is an Arabic script based languages however the Urdu character set is
the superset of all Arabic script-based languages. Character recognition has been
performed either through segmentation free or segmentation based approaches. There are
numerous issues with a segmentation free approach, and it is very difficult to train using a
large dataset. On the other hand in Urdu, a segmentation based approach has a large
overhead and has less accuracy for cursive script as compared to segmentation free
methods. In terms of classification, this thesis presents two approaches for Urdu character
recognition: segmentation free method based on a hybrid approach (HMM and fuzzy
logic), and bio-inspired character recognition system that uses fuzzy logics. Fuzzy is used
as inner and outer shells for preprocessing and post processing of HMM. Biologically
inspired multilayered fuzzy rules based system has been presented. Using the human
visual concept, a layered approach has been suggested where the diacritical marks are
separated from the ghost characters and mapped onto the primary ligature in the final
layer. The proposed technique also caters to Multilanguage character recognition system
for all Arabic script-based languages like Arabic, Persian, Urdu, Punjabi etc. The
presented multilayered bio-inspired approach recognizes the ligature by extracting the
features and combining them to find new premises in a bottom up fashion and it provided
accuracy of 87.4%.