Abstract:
Telephony networks are frequently connected to computers for speech processing to
extract useful information such as automatic speaker identification (ASI). Matching of feature
vectors extracted from speech sample of an unknown speaker, with models of registered
speakers is the most time consuming component in real-time speaker identification systems.
Time controlling parameters are size d and count T of extracted test feature vectors as well as
size M , complexity and count N of models of registered speakers. Reported speedup
techniques for Vector quantization (VQ) and Gaussian mixture model (GMM) based ASI
systems reduce test feature vector count T by pre-quantization and reduce candidate registered
speakers N by pruning unlikely models which introduces accuracy degradation. Vantage point
tree (VPT) indexing of code vectors has also been used to decrease the effect of parameter
M on ASI speed for VQ based systems. Somehow parameter d has remained unexplored in
ASI speedup studies.
Speedup techniques for VQ based and GMM based real-time ASI without loss of
accuracy are presented in this thesis. Speeding up closest code vector search (CCS) is focused for
VQ based systems. Capability of partial distortion elimination (PDE), through reducing d
parameter of codebook, was found more promising than VPT to speedup CCS. Advancing in
this direction, speech signal stationarity has been capitalized to a greater extent than previously
proposed technique of cluster size based sorting of code vectors to speedup PDE. Proximity
relationship among code vectors established through Linde Buzo Gray (LBG) process of
codebook generation has been substantiated. Based upon the high correlation of proximate code
vectors, circular partial distortion elimination (CPDE) and toggling-CPDE algorithms have been
proposed to speedup CCS. Further speedup for ASI is proposed through test feature vector
visequence pruning (VSP) when a codebook proves unlikely during search of best match speaker.
Empirical results presented in this thesis show that an average speedup factor up to 5.8 for 630
registered speakers of TIMIT 8kHz corpus and 6.6 for 230 speakers of NIST-1999 database
have been achieved through integrating VSP and TCPDE.
Speeding up potential of hierarchical speaker pruning (HSP) for faster ASI has also been
demonstrated in this thesis. HSP prunes unlikely candidate speakers based on ranking results of
coarse speaker models. Best match is then found from the detailed models of remaining
speakers. VQ based and GMM based ASI systems are explored in depth for parameters
governing the speedup performance of HSP. Using the smallest possible coarse model and
pruning the largest number of detailed candidate models is the key objective for speedup
through HSP. City block distance (CBD) is proposed instead of Euclidean distance (EUD) for
ranking speakers in VQ based systems. This allows use of smaller codebook for ranking and
pruning greater number of speakers. HSP has been ignored by previous authors for GMM based
ASI systems due to discouraging speedup results in their studies of VQ-based systems. However,
we achieved speedup factors up to 6.61 and 10.40 for GMM based ASI systems using HSP for
230 speaker from NIST-1999 and 630 speakers from TIMIT data, respectively. While speedup
factors of up to 22.46 and 34.78 are achieved on TIMIT and NIST-1999 data for VQ based
systems, respectively. All the speedup factors reported are with out any accuracy loss.