PASTIC Dspace Repository

Automatic Segmentation of Speech

Show simple item record

dc.contributor.author Baig, Mirza Muhammad Ali
dc.date.accessioned 2019-10-02T07:43:15Z
dc.date.accessioned 2020-04-11T15:39:34Z
dc.date.available 2020-04-11T15:39:34Z
dc.date.issued 2018
dc.identifier.govdoc 17738
dc.identifier.uri http://142.54.178.187:9060/xmlui/handle/123456789/5242
dc.description.abstract Time-aligned and labeled speech at sub-word level is required to develop spoken language technology components. Determining time boundaries of sub word units of speech and labelling those, is the speech segmentation problem. Manual human-labeling is considered to be the most accurate, which however requires significant amount of time when large amount of speech has to be dealt. The evidences which humans employ are based on knowledge of acoustic-phonetics and at very basic level works on spectrograms based techniques. Based on a hypothesis that computers can also segment speech automatically if evidence which human experts utilizes are used, leads us towards time effective automatic speech segmentation. In this thesis unsupervised automatic time-alignment of speech at sub-word level is carried out based on the pieces of information which spectrograms carry. The speech spectrogram engineered in this thesis does not possess information of vocal excitations and capture dynamics of vocal tract only. The novel feature is found suitable for segmentation problem and utilizes both forward and inverse characteristics of vocal tract (FICV). Additionally to evaluate the suitability of a feature extraction technique for speech segmentation task, a framework has also been developed. In the thesis, speech segmentation is carried out on indigenously developed Classical Arabic (CA) dataset and therefore becomes first scheme of its kind for CA which is an under resourced language in speech technology. The performance of FICV based speech segmentation scheme is compared and shown to be significantly better than standard unsupervised and supervised techniques both in terms of error-rates and alignment accuracies. Reduction of 12.29% in error rates is achieved with FICV based feature when compared with standard unsupervised technique. Carrying out supervised segmentation requires a basic sub-word level recognizer, which labels and aligns speech. In this connection a Hidden Markov Model (HMM) based speech recognizer is trained. The acoustic modeling is carried using a discriminative technique which shows better recognition accuracies of up to 4% than the non-discriminative technique. Thesis also verifies that using manually-labeled data for training acoustic models can further improve recognition accuracies by 3-4%. In this regard, thesis carries details of experimental steps which can also serve as guideline for developing an automatic speech recognizer for CA. en_US
dc.description.sponsorship Higher Education Commission, Pakistan en_US
dc.language.iso en_US en_US
dc.publisher NED University of Engineering & Technology, Karachi. en_US
dc.subject Electrical Engineering en_US
dc.title Automatic Segmentation of Speech en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account