Feature and signal enhancement for robust speaker identification of G.729 decoded speech

Kalpesh Raval, Ravi P. Ramachandran, Sachin S. Shetty, Brett Y. Smolenski

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations

Abstract

For wireless remote access security, there is an emerging need for biometric speaker identification systems (SID) to be robust to speech coding distortion. This paper presents results on a Gaussian mixture model (GMM) based SID system that is trained on clean speech and tested on the decoded speech of the G.729 codec. To mitigate the performance loss due to mismatched training and testing conditions, five robust features, two enhancement approaches and three fusion strategies are used. The first enhancement method is feature compensation based on the affine transform. The second is the McCree signal enhancement approach based on the spectral envelope information in the G.729 bit stream. Ensemble systems using decision level, score fusion and Borda count are studied. The best performance is obtained by performing signal enhancement, feature compensation and decision level fusion. This results in an identification success rate (ISR) of 89.8%.

Original languageAmerican English
Title of host publicationNeural Information Processing - 19th International Conference, ICONIP 2012, Proceedings
Pages345-352
Number of pages8
EditionPART 5
DOIs
StatePublished - Nov 19 2012
Event19th International Conference on Neural Information Processing, ICONIP 2012 - Doha, Qatar
Duration: Nov 12 2012Nov 15 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 5
Volume7667 LNCS

Other

Other19th International Conference on Neural Information Processing, ICONIP 2012
Country/TerritoryQatar
CityDoha
Period11/12/1211/15/12

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Feature and signal enhancement for robust speaker identification of G.729 decoded speech'. Together they form a unique fingerprint.

Cite this