Automatic Speaker Recognition Dependency on Both the Shape of Auditory Critical Bands and Speaker Discriminative MFCCs

doi:10.4316/AECE.2015.04004

4/2015 - 4

View TOC | « Previous Article | Next Article »

HIGHLY CITED PAPER

Automatic Speaker Recognition Dependency on Both the Shape of Auditory Critical Bands and Speaker Discriminative MFCCs

JOKIC, I. , DELIC, V. , JOKIC, S. , PERIC, Z.

View the paper record and citations in

Click to see author's profile in

SCOPUS,

IEEE Xplore,

Web of Science

Download PDF (1,258 KB) | Citation | Downloads: 752 | Views: 3,207

Author keywords
automatic speaker recognition, mel-frequency cepstral coefficients, energy correction, speaker discriminative, exponential auditory critical bands

References keywords
recognition(15), speech(12), speaker(10), processing(6), signal(5), mfcc(5), features(4)
Blue keywords are present in both the references section and the paper title.

About this article
Date of Publication: 2015-11-30
Volume 15, Issue 4, Year 2015, On page(s): 25 - 32
ISSN: 1582-7445, e-ISSN: 1844-7600
Digital Object Identifier: 10.4316/AECE.2015.04004
Web of Science Accession Number: 000368499800004
SCOPUS ID: 84949997146

Abstract

Full text preview

Accuracy of an automatic speaker recognition system predominantly depends on speaker models and features that are used. An influence of the shape of auditory critical bands and a contribution of individual components of MFCC-based feature vectors are investigated in the paper and some experimental results are presented and showed their impact on the accuracy of automatic speaker recognition. The speaker-discrimination capability of the MFCCs was experimentally determined by comparing training and test models for the same speaker. The experiments are conducted with three speech databases and showed that 0th and 19th (the last one) MFCCs are non speaker discriminative. The values of MFCCs are determined by the type of applied auditory critical band. The exponential auditory critical bands based on the lower part of exponential function have outperformed the speaker recognition accuracy of other auditory critical bands such as rectangular or triangular shape.

References

Cited By «-- Click to see who has cited this paper

[1] F. de Leon, K. Martinez, "Enhancing timbre model using MFCC and its time derivatives for music similarity estimation," in Proc. 20th European Signal Processing Conference (EUSIPCO 2012), Bucharest, Romania, August 27 - 31, 2012, pp. 2005-2009.

[2] T. Kinnunen, H. Li, "An overview of text-independent speaker recognition: From features to supervectors," Speech Communication, vol. 52, no. 1, pp. 12-40, 2010.
[CrossRef] [Web of Science Times Cited 931] [SCOPUS Times Cited 1219]

[3] F. Bimbot, J.-F. Bonastre, C. Fredouille, G. Gravier, I. Magrin-Chagnolleau, S. Meignier, T. Merlin, J. Ortega-Garcia, D. Petrovska-Delacretaz, and D. A. Reynolds, "A Tutorial on Text-Independent Speaker Verification," EURASIP Journal on Applied Signal Processing 2004:4, pp. 430-451, 2004.
[CrossRef] [Web of Science Times Cited 454] [SCOPUS Times Cited 659]

[4] J. P. Campbell, Jr., "Speaker recognition: a tutorial," Proceedings of the IEEE, Vol. 85, No. 9, pp. 1437-1462, 1997.
[CrossRef] [Web of Science Times Cited 849] [SCOPUS Times Cited 1241]

[5] M. M. Dobrovic, V. D. Delic, N. M. Jakovljevic, I. D. Jokic, "Comparison of the Automatic Speaker Recognition Performance over Standard Features," in Proc. of the 2012 IEEE 10th Jubilee International Symposium on Intelligent Systems and Informatics (SISY 2012), Subotica, Serbia, 20 - 22 September 2012, pp. 341 - 344.
[CrossRef] [SCOPUS Times Cited 6]

[6] V. Tiwari, "MFCC and its applications in speaker recognition," International Journal on Emerging Technologies, vol. 1(1), pp. 19-22, 2010.

[7] C. Ittichaichareon, S. Suksri, and T. Yingthawornsuk, "Speech Recognition using MFCC," in Proc. International Conference on Computer Graphics, Simulation and Modeling (ICGSM'2012), July 28-29, 2012 Pattaya (Thailand), pp. 135-138.

[8] S. D. Dhingra, G. Nijhawan, P. Pandit, "Isolated speech recognition using MFCC and DTW," International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, Vol. 2, Issue 8, August 2013, pp. 4085-4092.

[9] D. Neiberg, K. Elenius and K. Laskowski, "Emotion Recognition in Spontaneous Speech Using GMMs," in INTERSPEECH 2006 - ICSLP, 17-21 September 2006, Pittsburg, Pennsylvania, pp. 809-812.

[10] B. Panda, D. Padhi, K. Dash, Prof. S. Mohanty, "Use of SVM Classifier & MFCC in Speech Emotion Recognition System," International Journal of Advanced Research in Computer Science and Software Engineering, Volume 2, Issue 3, March 2012, pp. 225-230.

[11] Y. Attabi, M. J. Alam, P. Dumouchel, P. Kenny, D. O'Shaughnessy, "Multiple windowed spectral features for emotion recognition," Published in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 26-31 May 2013, Vancouver, BC, pp. 7527-7531.
[CrossRef] [SCOPUS Times Cited 24]

[12] D. Wu, B. Li, and H. Jiang, "Normalization and Transformation Techniques for Robust Speaker Recognition," Source: Speech Recognition, Technologies and Applications, Book edited by: France Mihelic and Janez Zibert, ISBN 987-953-7619-29-9, pp. 550, 311-330, November 2008, I-Tech, Vienna, Austria.
[CrossRef]

[13] I. Jokic, S. Jokic, Z. Peric, M. Gnjatovic, V. Delic, "Influence of the Number of Principal Components used to the Automatic Speaker Recognition Accuracy," Electronics and Electrical Engineering, Kaunas: Technologija, 2012, No. 7(123), pp. 83-86.

[14] B. Salna, J. Kamarauskas, "Evaluation of Effectiveness of Different Methods in Speaker Recognition," Electronics and Electrical Engineering, Kaunas: Technologija, 2010, No. 2(98), pp. 67-70.

[15] S. Molau, M. Pitz, R. Schlüter, and H. Ney, "Computing Mel-Frequency Cepstral Coefficients on the Power Spectrum," in Proc. International Conference on Acoustic, Speech and Signal Processing, Salt Lake City, UT, June 2001, Vol. 1, pp. 73-76.
[CrossRef]

[16] C. Lee, D. Hyun, E. Choi, J. Go, and C. Lee, "Optimizing Feature Extraction for Speech Recognition," IEEE Transactions on Speech and Audio Processing, Vol. 11, No. 1, January 2003, pp. 80-87.
[CrossRef] [Web of Science Times Cited 20] [SCOPUS Times Cited 37]

[17] R. F. Lyon, A. G. Katsiamis, E. M. Drakakis, "History and Future of Auditory Filter Models," Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS 2010), May 30 - June 2 2010, Paris, France, pp. 3809-3812.
[CrossRef] [SCOPUS Times Cited 85]

[18] M. Siafarikas, T. Ganchev, N. Fakotakis, G. Kokkinakis, "Wavelet Packet Approximation of Critical Bands for Speaker Verification," International Journal of Speech Technology, ISSN 1381 - 2416, vol.10, no.4, 2007, Springer, pp. 197-218.
[CrossRef] [Web of Science Times Cited 8] [SCOPUS Times Cited 8]

[19] A. C. den Brinker, "An interpretation of the auditory critical bands using a local Kautz transformation," in Proc. ProRISC 8th anual Workshop on Circuits, Systems and Signal Processing, Mierlo, The Netherlands, 27-28 Nov. 1997, pp. 83-88.

[20] B. R. Wildermoth, "Text-Independent Speaker Recognition Using Source Based Features," pp. 19-20, M. Phil. Thesis, Griffith University, Brisbane, Australia, January 2001.

[21] F. Cummins, M. Grimaldi, T. Leonard, and J. Simko, "The CHAINS speech corpus: CHaracterizing INdividual Speakers," in Proc. of SPECOM, 2006, pp. 1-6.

References Weight

Web of Science® Citations for all references: 2,262 TCR
SCOPUS® Citations for all references: 3,279 TCR

Web of Science® Average Citations per reference: 103 ACR
SCOPUS® Average Citations per reference: 149 ACR

TCR = Total Citations for References / ACR = Average Citations per Reference

We introduced in 2010 - for the first time in scientific publishing, the term "References Weight", as a quantitative indication of the quality ... Read more

Citations for references updated on 2024-05-04 11:30 in 59 seconds.

Note¹: Web of Science® is a registered trademark of Clarivate Analytics.
Note²: SCOPUS® is a registered trademark of Elsevier B.V.
Disclaimer: All queries to the respective databases were made by using the DOI record of every reference (where available). Due to technical problems beyond our control, the information is not always accurate. Please use the CrossRef link to visit the respective publisher site.

Copyright ©2001-2024
Faculty of Electrical Engineering and Computer Science
Stefan cel Mare University of Suceava, Romania

All rights reserved: Advances in Electrical and Computer Engineering is a registered trademark of the Stefan cel Mare University of Suceava. No part of this publication may be reproduced, stored in a retrieval system, photocopied, recorded or archived, without the written permission from the Editor. When authors submit their papers for publication, they agree that the copyright for their article be transferred to the Faculty of Electrical Engineering and Computer Science, Stefan cel Mare University of Suceava, Romania, if and only if the articles are accepted for publication. The copyright covers the exclusive rights to reproduce and distribute the article, including reprints and translations.

Permission for other use: The copyright owner's consent does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific written permission must be obtained from the Editor for such copying. Direct linking to files hosted on this website is strictly prohibited.

Disclaimer: Whilst every effort is made by the publishers and editorial board to see that no inaccurate or misleading data, opinions or statements appear in this journal, they wish to make it clear that all information and opinions formulated in the articles, as well as linguistic accuracy, are the sole responsibility of the author.

Menu:

Automatic Speaker Recognition Dependency on Both the Shape of Auditory Critical Bands and Speaker Discriminative MFCCs