Abstract:
With widespread use of Internet and other communication technologies, it has become
extremely easy to reproduce, communicate, and distribute digital contents. As a result,
authentication and copyright protection issues have arisen. Text is the most extensively
used medium travelling over the Internet besides image, audio, and video. The major part
of books, newspapers, web pages, advertisement, research papers, legal documents,
letters, novels, poetry, and many other documents is simply the plain text. Copyright
protection of plain text is a significant issue which cannot be condoned. The existing
solution for watermarking of plain text documents are not robust towards random
tampering attacks and are inapplicable for numerous domains. In this thesis, we have
proposed a zero-watermarking approach towards text watermarking. We have provided a
number of text watermarking solutions using inherent constituents of text such as double
letters, prepositions, words, sentences, and text structure to protect text against digital
forgery. We have designed a corpus having text of variable length and diversity;
containing original as well as attacked samples with various volumes and forms of
attacks. Instead of using binary watermarks on text, we used alphabetical, image, and
hybrid watermarks. Experimental results illustrate the effectiveness of the proposed
algorithms on text encountering combined insertion, deletion, and re-ordering attacks,
both in the dispersed and localized forms. The results are also compared with the recent
work on text watermarking.