Abstract:
Urdu literature has a rich tradition of poetry, with many forms, one of which is Ghazal. Urdu poetrystructures are mainly of Arabic origin. It has complex and different sentence structure compared to ourdaily language which makes it hard to classify. Our research is focused on the identification of poets ifgiven with ghazals as input. Previously, no one has done this type of work. Two main factors which helpcategorize and classify a given text are the contents and writing style. Urdu poets like Mirza Ghalib, MirTaqi Mir, Iqbal and many others have a different writing style and the topic of interest. Our model catersthese two factors, classify ghazals using different classification models such as SVM (Support VectorMachines), Decision Tree, Random forest, Naïve Bayes and KNN (K-Nearest Neighbors). Furthermore,we have also applied feature selection techniques like chi square model and L1 based feature selection.For experimentation, we have prepared a dataset of about 4000 Ghazals. We have also compared theaccuracy of different classifiers and concluded the best results for the collected dataset of Ghazals.