Разработка алгоритмов построения оценок достоверности для систем распознавания речи
Диссертация
Апробация работы. Результаты диссертация докладывались на XII международной конференции «Речь и Компьютер» SPECOM'2007 (Москва, 2007 г.), на XIX сессии Российского Акустического Общества (Нижний Новгород, 2007 г.), на XIII всероссийской конференции «Математические методы распознавания образов» (Санкт-Петербург, 2007 г.), на VII Открытом немецко-российском семинаре «Распознавание образов… Читать ещё >
Список литературы
- Нгуен М. Т. Оценка достоверности результатов автоматического распознавания речи // Труды Института системного анализа РАН. Динамика неоднородных систем, 2006, в. 10(2), с. 405−414
- Нгуен М. Т. Обнаружение новых слов и невербальных событий при распознавании речи // Модели, методы, алгоритмы и архитектуры систем распознавания речи, 2006, с. 119−137
- Нгуен М. Т. Построение оценок достоверности результатов распознавания речи с использованием альтернативных моделей // Сборник докладов 13-ой Всероссийской конференции «Математические методы распознавания образов», 2007, с. 370−371
- Нгуен М. Т., Чучупал В. Я. Верификация результатов автоматического распознавания речи // Сборник трудов XIX сессии Российского Акустического Общества, 2007, Т. 3. с. 63−67
- Nguyen М. Т., Chuchupal V. J. Word verification method for automatic speech recognition // Proceedings of the XII International Conference «Speech and Computer» Specom'2007, 2007, V. 1, p. 152−156
- Nguyen M. Т., Chuchupal V. J. Word confidence measure based on frame likelihood score // Pattern recognition and image analysis. Advances in mathematical theory and application, 2008, N. 3, p. 431−433
- Десятчиков А. А., Ковков Д. В., Лобанцов В. В., Маковкин К. А., Матвеев И. А., Мурынин А. Б., Чучупал В. Я. Комплекс Алгоритмов Для Устойчивого Распознавания Человека // Известия РАН, Теория и Системы Управления, 2006, с. 119−130
- Обжелян Н. К., Трунин-Донской В.Н Машины, которые говорят и слушают // Кишинев, Штиница, 1987
- Aho A. V., Ullman J. D. The Theory of Parsing, Translation and Computing // Prentice Hall, 1972
- Atal B. S., Schroeder M. R. Predictive Coding of Speech Signal // Proceedings of the International Congress on Acoustic, 1968
- Bahl L. R., Jelinek F., Mercer R. L. A Maximum Likelihood Approach to Continuous Speech Recognition // IEEE Transactions on Pattern Analysis and Machine Intelligence, 1983, pp. 179−190
- Baum L. E. An Inequality and Associated Maximization Technique in Statistical Estimation of Probabilistic Functions of a Markov Process // Inequalities, 1972, V. 3, pp. 1−8
- Benitez M. C., Rubio A., Torre A. Different Confidence Measures for Word Verification in Speech Recognition // Speech Communication, 2000, V. 32, pp. 79−94
- Bilmes J. A. A Gentle Tutorial of the EM Algorithm and Its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models, 1998
- Bouwman G., Boves L., Koolwaaij J. Weighting Phone Confidence Measure for Automatic Speech Recognition // Workshop on Voice Operated Telecom Services, 2000, pp. 59−62
- Charlet D. Optimizing Confidence Measure Based on HMM Acoustical Rescoring // Proceedings of the ISCA Tutorial and Research Workshop ARS2000, 2000, pp. 203−206
- Chase L. Word and Acoustic Confidence Annotation for Large Vocabulary Speech Recognition // Proceedings of the European Conference on Speech Communication and Technology, 1997, pp. 815−818
- Cox S., Rose R. Confidence Measures for the Switch-board Database // Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 1996, pp. 511−514
- Davis К. H., Biddulph R., Balashek S. Automatic Recognition of Spoken Digits // The Journal of the Acoustical Society of America, 1952, V. 24,1. 6, pp. 637−642
- Demuynck K., Van Compernolle D., Wambacq P. Doing Away with the Viterbi Approximation // Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2002, pp. 717−720
- Deriven M. Dynamic Bayesian Networks for Speech Recognition // Proceedings of the National Conference on Artificial Intelligence, 2002, pp. 981−981
- Egan J. P. Signal Detection Theory and ROC Analysis // Academic Press, 1975
- Eide E., Gish H., Jeanrenaud P., Mielke A. Understanding and Improving Speech Recognition Performance Through the Use of Diagnostic Tools // Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 1995, pp. 221−224
- Erzin E., Cetin A. E., Yardimci Y. Subband Analysis for Robust Speech Recognition in the Presence of Car Noise // Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 1995, pp. 417−420
- Fabian Т., Lieb R., Gunther R., Matthias T. Impact of Word Graph Density on the Quality of Posterior Probability Based Confidence Measures // Proceedings of the European Conference on Speech Communication and Technology, 2003, pp. 917−920
- Fawcett T. An Introduction to Roc Analysis // Pattern Recognition Letters, 2006, pp. 861−874
- Franzini M., Witbrock M., Lee K. A Connectionist Approach to Continuous Speech Recognition // Proceedings of the International Conference on Acoustics, Speech and Signal Processing, 1989, pp. 425−428
- Furui S. Fifty Years of Progress in Speech and Speaker Recognition // The Journal of the Acoustical Society of America, 2004, V. 116, I. 4, pp. 24 972 498
- Gold В., Morgan N. Speech and Audio Signal Processing // John Wiley and Sons, 2000
- Gowdy J. N., Tufekci Z. Mel-scaled Discrete Wavelet Coefficients for Speech Recognition // Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2000, pp. 1351−1354
- Harrison Т., Fallside F. A Connectionist Model for Phoneme Recognition in Continuous Speech // Proceedings of the International Conference on Acoustics, Speech and Signal Processing, 1989, pp. 417−420
- Huang X. D., Ariki Y., Jack M. A. Hidden Markov Models for Speech Recognition // Edinburgh University Press, 1990
- Humphrys M. Introduction to Artificial Intelligence, 2008, http://www.computing.dcu.ie/~humphrys/ca300/index.html
- Hunt A., McGlashan S. Speech Recognition Grammar Specification Version 1.0// W3C, 2004
- Itakura F., Saito S. Analysis Synthesis Telephony Based on the Maximum Likelihood Method // Proceedings of the International Congress on Acoustic, 1968, pp. 17−20
- Jelinek F. Statistical Method for Speech Recognition // MIT Press, 1997
- Jelinek F. The Development of an Experimental Discrete Dictation Recognizer//Proceedings of the IEEE, 1985, pp. 1616−1624
- Jia В., Zhu X., Luo Y., Hu D. Utterance Verification Using Modified Segmental Probability Model // Proceedings of the European Conference on Speech Communication and Technology, 1999, pp. 45−48
- Jiang L., Huang X. D. Vocabulary-independent Word Confidence Measure Using Subword Features // Proceedings of the International Conference on Spoken Language Processing, 1998
- Jurafsky D., Martin J. H. Speech and Language Processing // Prentice Hall, 2008
- Kemp Т., Schaaf T. Estimating Confidence Using Word Lattices // Proceedings of the European Conference on Speech Communication and Technology, 1997, pp. 827−830
- Kim K., Youn D. H., Lee C. Evaluation of Wavelet Filters for Speech Recognition // Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, 2000, v. 4, pp. 2891−2894
- Levinson S. E. Continuously Variable Duration Hidden Markov Models for Automatic Speech Recognition // Computer Speech and Language, 1986, pp. 29−45
- Lleida E., Rose R. C. Efficient Decoding and Training Procedure for Utterance Verification in Continuous Speech Recognition // Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 1996, pp. 507−510
- Lleida E., Rose R. C. Utterance Verification in Continuous Speech Recognition: Decoding and Training Procedures // IEEE Transactions on Speech and Audio Processing, 2000, pp. 126−139
- Macherey K., Bender O., Ney H. Multi-level Error Handling for Tree-Based Dialogue Course Management // Proceedings of the ISCA Tutorial and Research Workshop on Error Handling in Spoken Dialogue Systems, 2003, pp. 123−128
- Marlcel J. D., Gray A. H. Linear Prediction of Speech // Springer-Verlag, 1976, pp. 31−35
- Martin A., Doddington G., Kamm Т., Ordowski M., Pryzybocki M. The DET Curve in Assessment of Detection Task Performance // Proceedings of the European Conference on Speech Communication and Technology, 1997, pp. 1895−1898
- Mathan L., Miclet L. Rejection of Extraneous Input in Speech Recognition Applications, Using Multi-layer Perceptrons and the Trace of HMMs // Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 1991, pp. 93−96
- Moreau N., Jouvet D. Use of a Confidence Measure Based on Frame Level Likelihood Ratios for the Rejection of Incorrect Data // Proceedings of the European Conference on Speech Communication and Technology, 1999, pp. 291−294
- Neti C. Y., Roukos S., Eide E. Word-based Confidence Measures as a Guide for Stack Search in Speech Recognition // Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 1997, pp. 883−886
- Ney H., Martin S., Wessel F. Statistical Language Modeling Using Leaving-one-out // Corpus-based Methods in Language and Speech Processing, 1997, pp. 174−207
- Normadin Т., Lacouture R., Cardin R. MMIE Training for Large Vocabulary Continuous Speech Recognition // Proceedings of the International Conference on Acoustics, Speech and Signal Processing, 1994, pp. 13 671 370
- Picone J. W. Signal Modeling Techniques in Speech Recognition // Proceedings of the IEEE, 1993, pp. 1215−1247
- Pinto J., Sitaram R. N. V. Confidence Measures in Speech Recognition Based on Probability Distribution of Likelihoods // Proceedings of the European Conference on Speech Communication and Technology Interspeech'2005, 2005, pp. 3001−3004
- Rabiner L. R. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition // Proceedings of the IEEE, 1989, pp. 257−286
- Rabiner L. R., Juang В. H. Fundamentals of Speech Recognition 11 Prentice Hall, 1993
- Rabiner L. R., Juang В. H., Levinson S. E., Sondhi M. M Recognition of Isolated Digits Using Hidden Markov Models with Continuous Mixture Densities // AT&T Technical Journal, 1985, pp. 1211−1234
- Rahim M. G., Lee С. H. Discriminative Utterance Verification for Connected Digits Recognition // IEEE Transactions on Speech and Audion Processing, 1997, pp. 266−277
- Razik J., Mella O., Fohr D., Haton J. P. Local Word Confidence Measure Using Word Graph and N-Best List // Proceedings of the European Conference on Speech Communication and Technology, 2005, pp. 3369−3372
- Robinson A. J., Fallside F. A Dynamic Connectionist Model for Phoneme Recognition // Neural Networks from Models to Applications, 1988, pp. 541 550
- Rose R. C., Juang В. H., Lee С. H. A Training Procedure for Verifying String Hypothesis in Continuous Speech Recognition // Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 1995, pp. 281−284
- Sanderson C., Bengio S., Boulard H., Mariethoz J., Collobert R., BenZeghiba M. F., Cardinaux F., Marcel S. Speech and Face Based Biometric Authentification at IDIAP // Proceedings of the International Conference on Miltimedia and Expo, 2003, pp. 1−4
- San-Segundo R., Pellom В., Hacioglu K., Ward W. Confidence Measures for Spoken Dialogue Systems // Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2001, pp. 393−396
- Schaaf Т., Kemp T. Confidence Measures for Spontaneous Speech Recognition // Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 1997, pp. 875−878
- Sigurdsson S., Peterson К. В., Lehn-Schioler T. Mel Frequency Cepstral Coefficients: An Evaluation of Robustness of MP3 Encoded Music // Proceedings of the International Conference on Music Information Retrieval, 2006, pp. 286−289
- Siu M. H., Mark В., Au W. H. Minimization of Utterance Verification Error Rate as a Constrained Optimization Problem // IEEE Signal Processing Letters, 2006, v. 13, pp. 760−763
- Siu M., Gish H. Evaluation of Word Confidence for Speech Recognition Systems // Computer Speech And Language, 1999, pp. 299−319
- Soong F. K., Lo W. K. Generalized Posterior Probability for Minimum Error Verification of Recognized Sentences // Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2005, pp. 85−88
- Sukkar R. A. Rejection for Connected Digit Recognition Based on GPD Segmental Discrimination // Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 1994, pp. 393−396
- Sukkar R. A., Lee С. H. Vocabulary Independent Discriminative Utterance Verification for Nonkeyword Rejection in Subword Based Speech Recognition // IEEE Transactions on Speech and Audio Process, 1996, V. 4, pp. 420−429
- Uhrik C., Ward W. Confidence Metrics Based on N-gram Language Model Back-off Behaviors // Proceedings of the European Conference on Speech Communication and Technology, 1997, pp. 2772−2774
- Ullman J. D., Hopcroft J. E. Introduction to Automata Theory, Language and Computation // Addison Wesley, 1979
- Weintraub M., Beaufays F., Rivlin Z., Konig Y., Stolcke A. Neural-network Based Measures of Confidence for Word Recognition // Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 1997, pp. 887−890
- Weitraub M. LVCSR Log-likelihood Ratio Scoring for Keyword Spotting // Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 1995, pp. 297−300
- Wessel F. Word Posterior Probabilities for Large Vocabulary Speech Recognition // Ph.D. Thesis, RWTFI Aachen University, German, 2002
- Wessel F., Macherey K., Ney H. A Comparison of Word Graph and N-Best List Based Confidence Measures // Proceedings of the European Conference on Speech Communication and Technology, 1999, pp. 315−318
- Wessel F., Macherey K., Schluter R. Using Word Probabilities as Confidence Measures // Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 1998, pp. 225−228
- Wessel F., Schluter R., Macherey K., Ney H. Confidence Measures for Large Vocabulary Continuous Speech Recognition // IEEE Transactions on Speech and Audio Process, 2001, pp. 288−298
- Wessel F., Schluter R., Ney H. Using Posterior Word Probabilities for Improved Speech Recognition // Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2000, pp. 1587−1590
- Young S. J. A Review of Large-Vocabulary Continuous Speech Recognition // IEEE Signal Processing Magazine, 1996, pp. 45−57
- Young S., Evermann G., Hain T. Kershaw D., Moore G., Odell J., Ollason D., Povey D., Valtchev V., Woodland P. The НТК Book // Cambridge University Engineering Department, 2002
- Zhang R., Rudnicky A. I. Word Level Confidence Annotation Using Combinations of Features // Proceedings of the European Conference on Speech Communication and Technology, 2001, pp. 2105−2108
- Zweig G. Speech Recognition with Dynamic Bayesian Networks // Ph.D. Thesis, University of California, Berkeley 1998