|2/2014 - 3|
Graph Learning Based Speaker Independent Speech Emotion RecognitionXU, X. , HUANG, C. , WU, C. , WANG, Q. , ZHAO, L.
|Click to see author's profile in SCOPUS, IEEE Xplore, Web of Science|
|Download PDF (729 KB) | Citation | Downloads: 473 | Views: 2,708|
speech emotion recognition, speaker penalty graph learning, graph embedding framework, dimensionality reduction
recognition(12), speech(10), emotion(8), analysis(8), pattern(7), reduction(5), human(5), dimensionality(5), science(4), machine(4)
Blue keywords are present in both the references section and the paper title.
About this article
Date of Publication: 2014-05-31
Volume 14, Issue 2, Year 2014, On page(s): 17 - 22
ISSN: 1582-7445, e-ISSN: 1844-7600
Digital Object Identifier: 10.4316/AECE.2014.02003
Web of Science Accession Number: 000340868100003
SCOPUS ID: 84901856862
In this paper, the algorithm based on graph learning and graph embedding framework, Speaker-Penalty Graph Learning (SPGL), is proposed in the research of speech emotion recognition to solve the problems caused by different speakers. Graph embedding framework theory is used to construct the dimensionality reduction stage of speech emotion recognition. Special penalty and intrinsic graphs of the graph embedding framework is proposed to penalize the impacts from different speakers in the task of speech emotion recognition. The original speech emotion features are extracted by various categories, reflecting different characteristics of each speech sample. According to the experiments in speech emotion corpus using different classifiers, the proposed method with linear and kernelized mapping forms can both achieve relatively better performance than the state-of-the-art dimensionality reduction methods.
|References|||||Cited By «-- Click to see who has cited this paper|
| F. Dellaert, T. Polzin, A. Waibel, "Recognizing emotion in speech," in International Conference on Spoken Language, Philadelphia, PA, USA, 1996, pp.1970-1973. |
 D. Ververidis, C. Kotropoulos, "Emotional speech recognition: Resources, features, and methods," Speech Communication, vol. ED-48, pp. 1162-1181, 2006.
[CrossRef] [Web of Science Times Cited 361] [SCOPUS Times Cited 504]
 B. Schuller, G. Rigoll, "Timing levels in segment-based speech emotion recognition," in INTERSPEECH'2006, Pittsburgh, PA, USA, 2006, pp. 1818-1821.
 P. Oudeyer, "The production and recognition of emotions in speech: features and algorithms," International Journal of Human-Computer Studies, vol. ED-59, pp. 157-183, 2003.
[CrossRef] [Web of Science Times Cited 64] [SCOPUS Times Cited 276]
 R. Tato, R. Santos, R. Kompe, J. Pardo, "Emotional space improves emotion recognition," in International Conference on Spoken Language, Denver, CO, USA, 2002, pp. 2029-2032.
 B. Schuller, R. Müller, M. K. Lang, G. Rigoll, "Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles," in INTERSPEECH2005, Lisbon, Portugal, 2005, pp. 805-808.
 B. Schuller, S. Reiter, R. Muller, M. Al-Hames, "Speaker independent speech emotion recognition by ensemble classification," in IEEE International Conf. Multimedia and Expo(ICME), Amsterdam, The Netherlands, 2005, pp. 864-867.
[CrossRef] [SCOPUS Times Cited 79]
 T. Kostoulas, T. Ganchev, N. Fakotakis, "Study on speaker-independent emotion recognition from speech on real-world data," in Verbal and nonverbal features of human-human and human-machine interaction, Springer Berlin Heidelberg, 2008, pp. 235-242.
[CrossRef] [SCOPUS Times Cited 11]
 M. Belkin, P. Niyogi, "Laplacian eigenmaps and spectral techniques for embedding and clustering," in Advances in Neutral Information Processing Systems(NIPS) 14, Vancouver, Canada, 2002, pp. 585-591.
 X. He, P. Niyogi, "Locality preserving projections," in Advances in Neural Information Processing Systems (NIPS) 16, Whistler, Canada, 2003, pp. 153-160.
 S. Roweis, L. Saul, "Nonlinear dimensionality reduction by locally linear embedding," Science, vol. ED-290(5500), pp. 2323-2326, 2000.
[CrossRef] [SCOPUS Times Cited 8324]
 S. Lafon, A. Lee, "Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning, and data set parameterization," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. ED-28(9), pp. 1393-1403, 2006.
[CrossRef] [Web of Science Times Cited 282] [SCOPUS Times Cited 349]
 J. Tenenbaum, V. de Silva, J. Langford, "A global geometric framework for nonlinear dimensionality reduction," Science, vol. ED-290, pp. 2319-2323, 2000.
[CrossRef] [SCOPUS Times Cited 7483]
 H. Chen, H. Chang, T. Liu, "Local discriminant embedding and its variants," in IEEE Conf. Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA, 2005, pp. 846-853.
[CrossRef] [SCOPUS Times Cited 448]
 S. Yan, D. Xu, B. Zhang, H. Zhang, Q. Yang, S. Lin, "Graph embedding and extensions: a general framework for dimensionality reduction," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. ED-29(1), pp. 40-51, 2007.
[CrossRef] [SCOPUS Times Cited 1892]
 F. De la Torre, "A least-squares framework for component analysis," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. ED-34(6), pp. 1041-1055, 2012.
[CrossRef] [SCOPUS Times Cited 89]
 M. You, C. Chen, J. Bu, J. Liu, J. Tao, "Emotional speech analysis on nonlinear manifold," in International Conference on Pattern Recognition(ICPR), Hong Kong, 2006, pp. 91-94.
[CrossRef] [SCOPUS Times Cited 18]
 S. Zhang, X. Zhao, B. Lei, "Speech emotion recognition using an enhanced Kernel Isomap for human-robot interaction," International Journal of Advanced Robotic Systems, vol. ED-10(114), pp. 1-7, 2013.
[CrossRef] [SCOPUS Times Cited 8]
 J. Shawe-Taylor, N. Cristianini, Kernel methods for pattern analysis. Cambridge University Press, 2004.
 Friedman J H, "Regularized discriminant analysis," Journal of the American Statistical Association, vol. ED-84(405), pp. 165-175, 1989.
[CrossRef] [SCOPUS Times Cited 1433]
 D. Cai, X. He, "Semi-supervised discriminant analysis," in International Conference on Computer Vision(ICCV). Rio de Janeiro, Brazil, 2007, pp. 1-7.
[CrossRef] [SCOPUS Times Cited 490]
 L. He, J. M. Buenaposada, L. Baumela, "An empirical comparison of graph-based dimensionality reduction algorithms on facial expression recognition tasks," in International Conf. Pattern Recognition (ICPR), Tampa, FL, USA, 2008, pp. 1-4.
 F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier, B. Weiss, "A database of German emotional speech," in INTERSPEECH2005, Lisbon, Portugal, 2005, pp. 1517-1520.
 O. Martin, I. Kotsia, B. Macq, I. Pitas, "The enterface'05 audio-visual emotion database," in IEEE Conf. Data Engineering Workshops, Atlanta, GA, USA, 2006, pp. 8-8.
[CrossRef] [SCOPUS Times Cited 178]
Web of Science® Citations for all references: 707 TCR
SCOPUS® Citations for all references: 21,582 TCR
Web of Science® Average Citations per reference: 28 ACR
SCOPUS® Average Citations per reference: 863 ACR
TCR = Total Citations for References / ACR = Average Citations per Reference
We introduced in 2010 - for the first time in scientific publishing, the term "References Weight", as a quantitative indication of the quality ... Read more
Citations for references updated on 2018-10-20 23:13 in 117 seconds.
Note1: Web of Science® is a registered trademark of Clarivate Analytics.
Note2: SCOPUS® is a registered trademark of Elsevier B.V.
Disclaimer: All queries to the respective databases were made by using the DOI record of every reference (where available). Due to technical problems beyond our control, the information is not always accurate. Please use the CrossRef link to visit the respective publisher site.
Faculty of Electrical Engineering and Computer Science
Stefan cel Mare University of Suceava, Romania
All rights reserved: Advances in Electrical and Computer Engineering is a registered trademark of the Stefan cel Mare University of Suceava. No part of this publication may be reproduced, stored in a retrieval system, photocopied, recorded or archived, without the written permission from the Editor. When authors submit their papers for publication, they agree that the copyright for their article be transferred to the Faculty of Electrical Engineering and Computer Science, Stefan cel Mare University of Suceava, Romania, if and only if the articles are accepted for publication. The copyright covers the exclusive rights to reproduce and distribute the article, including reprints and translations.
Permission for other use: The copyright owner's consent does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific written permission must be obtained from the Editor for such copying. Direct linking to files hosted on this website is strictly prohibited.
Disclaimer: Whilst every effort is made by the publishers and editorial board to see that no inaccurate or misleading data, opinions or statements appear in this journal, they wish to make it clear that all information and opinions formulated in the articles, as well as linguistic accuracy, are the sole responsibility of the author.