Post-error Correction in Automatic Speech Recognition Using Discourse Information

doi:10.4316/AECE.2014.02009

2/2014 - 9

View TOC | « Previous Article | Next Article »

Post-error Correction in Automatic Speech Recognition Using Discourse Information

KANG, S. , KIM, J.-H. , SEO, J.

View the paper record and citations in

Click to see author's profile in

SCOPUS,

IEEE Xplore,

Web of Science

Download PDF (654 KB) | Citation | Downloads: 1,015 | Views: 3,883

Author keywords
post correction, speech recognition, re-ranking model, analysis of user intention, spoken language understanding, spoken dialog system

References keywords
speech(11), recognition(10), language(8), spoken(5), information(5), systems(4), science(4), linguistics(4), computational(4), association(4)
Blue keywords are present in both the references section and the paper title.

About this article
Date of Publication: 2014-05-31
Volume 14, Issue 2, Year 2014, On page(s): 53 - 56
ISSN: 1582-7445, e-ISSN: 1844-7600
Digital Object Identifier: 10.4316/AECE.2014.02009
Web of Science Accession Number: 000340868100009
SCOPUS ID: 84901838708

Abstract

Full text preview

Overcoming speech recognition errors in the field of human-computer interaction is important in ensuring a consistent user experience. This paper proposes a semantic-oriented post-processing approach for the correction of errors in speech recognition. The novelty of the model proposed here is that it re-ranks the n-best hypothesis of speech recognition based on the user's intention, which is analyzed from previous discourse information, while conventional automatic speech recognition systems focus only on acoustic and language model scores for the current sentence. The proposed model successfully reduces the word error rate and semantic error rate by 3.65% and 8.61%, respectively.

References

Cited By «-- Click to see who has cited this paper

[1] S. Kaki, E. Sumita, H. Iida, "A Method for Correcting Errors in Speech Recognition Using the Statistical Features of Character Co-occurrence," in Proc. of Association for Computational Linguistics, pp. 653-657, 1998.
[CrossRef]

[2] R. Lopez-Cozar, Z. Callejas, "ASR Post-Correction for Spoken Dialogue Systems based on Semantic, Syntactic, Lexical and Contextual Information," Speech Communication, vol. 50, no. 8-9, pp. 745-766, 2008.
[CrossRef] [Web of Science Times Cited 12] [SCOPUS Times Cited 17]

[3] J. Allen, B. W. Miller, E. K. Ringger, T. Sikorski, "A Robust System for Natural Spoken Dialog," in Proc. of Association for Computational Linguistics, pp. 62-70, 1996.
[CrossRef]

[4] E. Ringger, J. Allen, "A Fertility Channel Model for Post Correction of Continuous Speech Recognition," in Proc. of International Conference on Spoken Language Processing, pp. 897-900, 1996.
[CrossRef]

[5] M. Jeong, G. G. Lee, "Improving Speech Recognition and Understanding using Error-Corrective Reranking," ACM Transactions on Asian Language Information Processing, vol. 7, pp. 2:1-2:26, 2008.
[CrossRef] [SCOPUS Times Cited 4]

[6] T. Hazen, T. Burianek, J. Polifroni, S. Seneff, "Recognition confidence scoring for use in speech understanding systems," Computer Speech and Language, vol. 16, no. 1, pp. 49-67, 2002.
[CrossRef] [Web of Science Times Cited 64] [SCOPUS Times Cited 100]

[7] T. Baumann, M. Atterer, D. Schlangen, "Assessing and improving the performance of speech recognition for incremental systems," in Proc. Of Association for Computational Linguistics, pp. 380-388, 2009.
[CrossRef] [SCOPUS Times Cited 36]

[8] C. Clavel, G. Adda, Cailliau, M. Garnier-Rizet, A. Cavet, G. Chapuis, S. Courcinous, C. Danesi, A. Daquo, M. Deldossi, S. Guillemin-Lanne, M. Seizou, P. Suignard, "Spontaneous speech and opinion detection: mining call-centre transcripts," Language Resources and Evaluation, 2013.
[CrossRef] [Web of Science Times Cited 9] [SCOPUS Times Cited 15]

[9] J. Vilaneau, J. Y. Antoine, "Deeper Spoken Language Understanding for Man-machine Dialogue on Broader Application Domains: A Logical Alternative to Concept Spotting," in Proc. of Workshop on Semantic Representation of Spoken Language, pp. 50-57, 2009.
[CrossRef]

[10] H. Lee, H. Kim, J. Seo, "Efficient Domain Action Classification using Neural Networks," Lecture Note in Computer Science, vol. 4233, pp. 150-158, 2006.
[CrossRef] [SCOPUS Times Cited 2]

[11] H. Kim, "A Dialogue-based NLIDB System in a Schedule Management Domain: About the Method to Find User's Intentions," in Proc. of conference on Current Trends in Theory and Practice of Computer Science, pp. 869-877, 2007.
[CrossRef] [SCOPUS Times Cited 7]

[12] D. Kim, H. Lee, C. Seon, H. Kim, and J. Seo, "Speakers' Intention Prediction Using Statistics of Multi-level Features in a Schedule Management Domain," in Proc. of Association for Computational Linguistics on Human Language Technologies, pp. 229-232, 2008.
[CrossRef] [SCOPUS Times Cited 2]

[13] H. Kim, C. Seon, J. Seo, "Review of Korean speech act classification: machine learning methods," Journal of Computing Science and Engineering, vol. 5, no 4, pp. 288-293, 2011.
[CrossRef]

[14] V. Vapnik, The Nature of Statistical Learning Theory. Springer Verlag, 1995.
[CrossRef]

[15] L. Bottou, C. Cortes, J. Denker, H. Drucker, I. Guyon, L. Jackel, Y. LeCun, U. Muller, E. Sackinger, P. Simard, and V. Vapnik, "Comparison of Classifier Methods: A Case Study in Handwritten Digit Recognition", in Proc. of International Conference on Pattern Recognition, vol. 2, pp. 77-82, 1994.
[CrossRef]

[16] S. Kang, H. Kim, J. Seo, "A Reliable Multidomain Model for Speech Act Classification," Pattern Recognition Letters, vol. 31, no 1, pp. 71-74, 2010.
[CrossRef] [Web of Science Times Cited 9] [SCOPUS Times Cited 9]

[17] C. Seon, H. Kim, J. Seo, "Efficient Appointment Information Extraction from Messages in Mobile Devices with Limited Hardware Resources," Pattern Recognition Letters, vol. 32, no 2, pp. 127-133, 2011.
[CrossRef] [Web of Science Times Cited 4] [SCOPUS Times Cited 4]

[18] R. Nallapati, "Discriminative Models for Information Retrieval," in Proc. of SIGIR, pp. 64-71, 2004.
[CrossRef] [SCOPUS Times Cited 222]

[19] K. Lee, M. Chung, "Morpheme-Based Modeling of Pronunciation Variation for Large Vocabulary Continuous Speech Recognition in Korean," IEICE Transaction on Information and Systems, vol. E90-D, no. 7, pp. 1063-1072, 2004.
[CrossRef] [Web of Science Times Cited 9] [SCOPUS Times Cited 12]

[20] M. Lee, D. Han, "Ubiscript: A Script Language for Ubiquitous Environment," Journal of Computing Science and Engineering, vol. 5, no 2, pp. 141-149, 2011
[CrossRef]

References Weight

Web of Science® Citations for all references: 107 TCR
SCOPUS® Citations for all references: 430 TCR

Web of Science® Average Citations per reference: 5 ACR
SCOPUS® Average Citations per reference: 20 ACR

TCR = Total Citations for References / ACR = Average Citations per Reference

We introduced in 2010 - for the first time in scientific publishing, the term "References Weight", as a quantitative indication of the quality ... Read more

Citations for references updated on 2024-04-18 11:25 in 131 seconds.

Note¹: Web of Science® is a registered trademark of Clarivate Analytics.
Note²: SCOPUS® is a registered trademark of Elsevier B.V.
Disclaimer: All queries to the respective databases were made by using the DOI record of every reference (where available). Due to technical problems beyond our control, the information is not always accurate. Please use the CrossRef link to visit the respective publisher site.

Copyright ©2001-2024
Faculty of Electrical Engineering and Computer Science
Stefan cel Mare University of Suceava, Romania

All rights reserved: Advances in Electrical and Computer Engineering is a registered trademark of the Stefan cel Mare University of Suceava. No part of this publication may be reproduced, stored in a retrieval system, photocopied, recorded or archived, without the written permission from the Editor. When authors submit their papers for publication, they agree that the copyright for their article be transferred to the Faculty of Electrical Engineering and Computer Science, Stefan cel Mare University of Suceava, Romania, if and only if the articles are accepted for publication. The copyright covers the exclusive rights to reproduce and distribute the article, including reprints and translations.

Permission for other use: The copyright owner's consent does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific written permission must be obtained from the Editor for such copying. Direct linking to files hosted on this website is strictly prohibited.

Disclaimer: Whilst every effort is made by the publishers and editorial board to see that no inaccurate or misleading data, opinions or statements appear in this journal, they wish to make it clear that all information and opinions formulated in the articles, as well as linguistic accuracy, are the sole responsibility of the author.

Menu:

Post-error Correction in Automatic Speech Recognition Using Discourse Information