Abstract:
Computational auditory scene analysis (CASA) has significant role in speech
segregation from monaural audio mixtures and generally a measure for performance of speech
recognition systems. Pitch estimation has a substantial role in performance of CASA systems. This
study presents a novel pitch estimation framework for speech segregation from monaural audio
mixtures using cochleagram morphing. The proposed framework takes the rough estimation of target
pitch from given audio mixtures containing speech and background interferences. Discrete set
consisting morphed versions of cochleagram is obtained using k-Means clustering. The estimated pitch
values are improved by validating and smoothing them to morphed cochleagram. Measure of refined
estimated pitch contours along with harmonicity and temporal continuity are used to segregate target
speech. The proposed framework produced 83.13% accuracy for MIR-1k dataset which is considerably
higher than the existing methods