Разработка и исследование алгоритмов распознавания речи для голосового управления через телефонную сеть
Диссертация
Система АРР почти полностью определяет качество работы системы голосового управления, которое прежде всего зависит от точности распознавания слов пользователя. Обычно системы голосового управления рассчитаны на использование малого словаря (~10 слов) и работу с изолированными словами (между соседними словами обязательно должна быть сделана пауза). Последние условия позволяют достичь определенного… Читать ещё >
Список литературы
- Ахмед Н., Pao К. Р. Ортогональные преобразования при обработке цифровых сигналов. — М., Связь, 1980.
- Беллман Р., Энджел Э. Динамическое программирование и уравнения в частных производных. М., Мир, 1974.
- Гудонавичюс Р. В., Кемешис П. П., Читавичюс А. Б. Распознавание речевых сигналов по их структурным свойствам. Л., Энергия, 1977.
- Зуев А. Б., Кисельман Б. А. Новые весовые функции, Вестник ВерхнеВолжского отделения Академии технологических наук РФ 2(4)/97.
- Калверт Ч. Delphi 4. Самоучитель: Пер. с англ. / Под ред. А. П. Сергеева К., «ДиаСофт», 1999.
- Кисельман Б. А. Алгоритм оценки фонового уровня энергетических траекторий дискретной речи. // Доклады научно-технической конференции КОГРАФ-2000. Н. Новгород, 2000.
- Кисельман Б. А. Реализация цифрового полосового фильтра для каскад-но-параллельного фильтрбанка. // Доклады научно-технической конференции КОГРАФ-2000. Н. Новгород, 2000.
- Ю.Лабунец В. Г. Алгебраическая теория сигналов и систем. Красноярск: Изд-во Краснояр. ун-та, 1984.
- Ли У. А. Методы автоматического распознавания речи. М., Мир, 1983.
- Михайлов В. Г., Златоустова Л. В. Измерение параметров речи. М., Радио и связь, 1987.
- Рабинер Л., Гоулд Б. Теория и применение цифровой обработки сигналов: Пер. с англ. / Под ред. Ю. Н. Александрова. М., Мир, 1978.
- Рабинер Л. Р., Шафер Р. В. Цифровая обработка речевых сигналов. -М., Радио и связь, 1981.
- Ту Дж., Гонсалес Р. Принципы распознавания образов: Пер. с англ. / Под ред. Ю. Н. Журавлева. М., Мир, 1978.
- Фаронов В. В. Delphi 4. Учебный курс. М., Нолидж, 1998.
- Цемелъ Г. И. Опознавание речевых сигналов. М., Наука, 1971.
- Altschuler R. A., Bobbin R. P., Hoffman D. W., eds, Neurobiology of Hearing: The Cochlea, Raven Press, New York, 1986.
- Ashmore J. F. The cellular machinery of the cochlea. Exper. Physiol., 79: 113−134, 1994.
- Bahl, L., Bakis, R., Cohen, P., Cole, A., Jelinek, F., Lewis, В., and Mercer, R. Speech Recognition of a Natural Text Read as Isolated Words. In Proc. IEEE International, 1981.
- Batlle E., Fonollosa J. A. R., Determining CPU and Memory Requirements for Real-Time Speech Recognition Systems Using the TMS320C3x/4x. -ESIEE, Paris, 1996.
- Bekesy G von Experiments in hearing. New-York: Mc Graw Hill, 1960.
- Blomberg M. Towards production-oriented techniques for speech recognition. Royal Institute of Technology, Stockholm, 1994.
- Blomberg, M., Carlson, R., Elenius, K. & Granstnum, B. Auditory models and isolated word recognition, Proc. of ICASSP '84, San Diego, Vol. 2, pp. 17.9.1−17.9.4.
- Blomberg M., Elenius K. A device for automatic speech recognition. In Proceedings of the Nordic Acoustical Society, 1982, pp. 383−386.
- Blomberg, ML, Elenius, K. Automatic time alignment of speech with a phonetic transcription, STL-QPSR 1/1985, KTH, Stockholm, pp. 37−45.
- Blomberg, M., Elenius, K. Nonlinear frequency warp for speech recognition, Proceedings of the French-Swedish seminar on speech, Grenoble, France, April 22−24, 1985, pp. 435−443.
- Blomberg M., Elenius K., Lundin F. Voice-controlled dialing in an intercom system. International Symposium on Human factors in Telecommunications, Helsinki, 1983, June, pp. 233−238.
- Brown, P. The Acoustic-Modeling Problem in Automatic Speech Recognition. Carnegie Mellon University, 1987.
- Charles R., Jankowsky J., Hoang-Doan H. V., Lippman R. P. A comparison of signal processing frontends for automatic word recognition. IEEE transaction on Speech and Audio Processing, 3: 296−293, 1995.
- Chistovich L. A. Central auditory processing of peripheral vowel spectra. J. Acoust. Soc. Am., pp. 789 — 805.
- Ghitza, O. Auditory Nerve Representations as a Basis for Speech Processing, Advances in Speech Processing (Eds. S. Furui, M. Sondhi), Marcel Dekker, NY, 453−485, 1991.
- Dallos P. The active cochlea. J Neurosci. 1992 Dec- 12(12):4575−85.
- Dallos, P., Popper, A.N., Fay, R.R. The cochlea. Springer Handbook of Auditory Research Vol. 8, 1996, Springer Verlag, New York.
- Darling, A. M. Properties and Implementation of the GammaTone Filter: A Tutorial, in Speech Hearing and Language (UCL Work in Progress), 5, 4361, University College London, Department of Phonetics and Linguistics, 1991.
- Deller J. R., Jr, Hansen J. H. L., Proakis J. G. Discrete-Time Processing of Speech Signals. IEEE Press, USA, 2000.
- Dillon, H. and Walker, G. Compression in Hearing Aids: An Analysis, a Review, and Some Recommendations. NAL Report No. 90. Australian Government Publishing Service, Canberra, 1982.
- Furui, S. On the role of spectral transition for speech perception, J.
- Acoust. Soc. Am. 80, 1016−1025, 1986.
- Furui S. Speaker-Independent Isolated Word Recognition Using Dynamic Features of the Speech Spectrum, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 34, no. 1, pp. 52−59, Feb. 1986.
- Glasberg, B. R., Moore, B. C. J., Patterson, R. D., Nimmo-Smith, I. Dynamic range and asymmetry of the auditory filter, J. Acoust. Soc. Am. 76, 419−427,1984.
- Greenberg S. The ear as a speech analyzer, Journal of Phonetics, vol. 16, pp. 139−146, 1988.
- Greenberg S. The ears have it: the auditory basis jf speech perception. -Department of Linguistics International Computer Science Institute University of California, Berkeley, CA 94 720 USA.
- Greenberg S. Understanding Speech Understanding: Towards a Unified Theory of Speech Perception. Department of Linguistics International Computer Science Institute University of California, Berkeley, CA 94 720 USA.
- Hanson B. A., Applebaum T.H. Robust Speaker-Independent Word Recognition Using Static, Dynamic and Acceleration Features: Experiments with Lombard and Noisy Speech," in ICASSP, pp. 857−860, 1990.
- Hassenein, H., Rudko M. On the Use of Discrete Cosine Transform in Cepstral Analysis, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 32, no. 4, pp. 922−923, Aug. 1994.
- Hermansky H. Exploring temporal domain for robustness in speech recognition. In proceedings the 15th International Congress on Acoustic, Trondhelm, Norway, 1995.
- Hermansky H. Perceptual linear predictive (PLP) analysis for speech. J. Acoust. Soc. Am., pp. 1738 — 1752, 1990.
- Hermansky H., Junqua J. C. Optimization of perceptually based ASR frontend, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing 88, paper S5.10, pp. 219−222.
- Hermansky H., Morgan N., RASTA processing of speech. IEEE Transactions on Speech and Audio Processing, 2(4): 578−589, Oct. 1994.
- Hermansky, H., Morgan, N., Aruna, B., Kohn, P. RASTA-PLP speech analysis technique, Proceedings, 1992 IEEE ICASSP, San Fransisco, 121−124.
- Hermansky H., Morgan N., Hirsch H. Recognition of speech in additive and convolutional noise based on RASTA spectral processing", Proc ICASSP, vol. 2, pp 83−85, 1993.
- Hudspeth A. J. How the ear’s works work. Nature. 1989 Oct 5−341(6241):397−404.
- Irino T., Unoki, M. An analysis/synthesis auditory filterbank based on an IIR implementation of the gammachirp. ATR Human Information Processing Research Labs, Japan Advanced Institute of Science and Technology, 1999.
- Jahn A. F., Santos-Sacchi, J. eds, Cochlear physiology., Raven Press, New York, 1988.
- Johnstone, B., Patuzzi, R., and Yates, G. K. Basilar membrane measurements and the travelling wave, Hearing Res., 1986, 22, 147−153.
- Kates, J. An Adaptive Digital Cochlear Model, Proceedings, 1991 IEEE ICASSP, Toronto, 3621−3624.
- Kobzyashi, T., Imai S. Spectral Analysis Using Generalized Cepstrum, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 32, no. 6, pp. 1235−1237, Dec. 1984.
- Lamel L. F., Rabiner L. R., Rosenberg A. E. An improved endpoint detector for isolated word recognition. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Atlanta, Ga., vol. 29, pp. 777−785, 1981.
- Lee, K. F., Hon, H. W., and Huang, X. Speech recognition using Hidden Markov Models: a CMU perspective, Speech Communication, 1991, 9, 497 508.
- Lim D. J. Functional structure of the organ of Corti: a review. Hearing Research 22, 117−146, 1986.
- Lyon R. F. (1982). A Computational Model of Filtering, Detection, and Compression in the Cochlea, Proceedings, 1982 IEEE ICASSP, Paris, 12 821 285.
- Lyon R. F. Automatic Gain Control in Cochlear Mechanics, The Mechanics and Biophysics of Hearing, P. Dallos et al. (eds.), 395−401, Springer-Verlag, 1990.
- Lyon R. F. The All-Pole Gammatone Filter and Auditory Models. Apple" Computer Inc., 1996.
- Lyon R. F., Mead C. An Analog Electronic Cochlea, 1988 IEEE Trans. On Acoust., Speech, and Sig. Proc., 36, 1 119−1133.
- Mahalanobis P. C., On the generalized distance in statistics, Proceedings of the National Institute of Science (India), vol. 12, pp. 49−55, 1936.
- Moore B. C. J. Psychophysical tuning curves measured in simultaneous and forward masking, J. Acoust. Soc. Am. 63, 524−532, 1978.
- Moore B. C. J., and Glasberg, B. R. Growth of forward masking for sinusoidal and noise maskers as a function of signal delay- implications for suppression in noise, J. Acoust. Soc. Am. 73, 1249−1259, 1983.
- Moore B. C. J., Glasberg, B. R., and Roberts, B. Refining the measurement of psychophysical tuning curves, J. Acoust. Soc. Am. 76, 1057−1066, 1984.
- Myers C. S., Rabiner L. R., Rosenberg A. E. Perfomance tradeoffs in dynamic time warping algorithms for isolated word recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, pp. 622−635, Dec. 1980.
- Nagata K., Kato Y., Chiba S. Spoken digit recognizer for the Japanese language, Proceedings of the 4th International Conference on Acoustics, 1962.
- Netter F. H. Nervous system, part I: anatomy and physiology. Ciba collection of medical illustration. Ciba, West Caldwell, NJ, 1986.
- Patterson R., Anderson, T., Allerhand, M. The Auditory Image Model as a Preprocessor for Spoken Language, Proceedings Acoust. Soc. of Japan ICSLP, 1395−1398, 1994.
- Pickles J. O. Recent advances in cochlear physiology. Prog Neurobiol. 1985−24(1): 1−42.
- Picone J. Fundamentals of speech recognition: a short course. Institute for Signal and Information Processing, Department of Electrical and Computer Engineering. Mississippi State University, 1996.
- Picone J. Signal Modeling Techniques in Speech Recognition, Proceedings of the IEEE, vol. 81, no. 9, pp. 1215−1246, Sept. 1993.
- Rawate B. I., Robinson P. D., Implementation of an HMM-Based, Speaker-Independent Speech Recognition System on the TMS320C2x and, TMS320C5x. Speech and Image Understanding Laboratory Computer Sciences Center Texas Instruments Incorporated, 1996.
- Rabiner L. R. On the use of autocorrelation analysis for pitch detection, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 26, pp. 24−33, Feb. 1977.
- Rabiner L. R. On creating reference templates for speaker independent recognition of isolated words, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 26, pp. 34−42, Feb. 1978.
- Rabiner L. R., Levinson S. E. Isolated and connected word recognition: Theory and selected applications, IEEE Transactions on Communications, vol. 29, pp. 621−6593, May 1981.
- Rabiner L. R., Levinson S. E., Sondhi M. M. On the application of vector quantization and hidden Markov models to speaker-independent isolated word recognition, Bell System Technical Journal, vol. 62, pp. 1075−1105, 1983.
- Rabiner L. R., Rosenberg A. E., Levinson S. E. Considerations in dynamic time warping algorithms for discrete utterance recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 26, pp. 575−582, Dec. 1978.
- Rabiner L. R., Sambur M. R. An algorithm for determining the endpoints of isolated utterances, Bell System Technical Journal, vol. 54, pp. 297−315, 1975.
- Robinson A. J. Speech Analysis, lecture course. 1998.
- Sakoe H. Two-level DP matching: A dynamic programming based pattern recognition algorithm for connected word recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 27, pp. 588−595, Dec. 1979.
- Sakoe H., Chiba S, Dynamic programming algorithm optimization for spoken word recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 26, pp. 43−49, Feb. 1978.
- Santos Sacchi J. Cochlear physiology. In: Physiology of the Ear, A.F. Jahn and J. Santos-Sacchi, eds, Raven Press, New York, pp. 271−293, 1988.
- Slaney M. An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank. Apple Computer Technical Report #35, Perception Group Advanced Technology Group © 1993, Apple Computer, Inc.
- Slaney M., Auditory Toolbox: A MATLAB toolbox for auditory modeling work, Apple Technical Report #45, 1994.
- Slaney M. Lyon’s Cochlear Model. Apple Computer Technical Report #13, Perception Group Advanced Technology Group © 1988, Apple Computer, Inc.
- Smith S. W. Digital signal processing. California Technical Publishing San Diego, California, 1999.
- Spoendlin H. Anatomy of cochlear innervation. Am. J. Otolaryngol. 6, 453 467,1985.
- Strope B. P. A Model of dynamic auditory perception and its application to robust speech recognition. University of California, Los Angeles, 1995.
- Strope B. P., A Model of Dynamic Auditory Perception and its Application to Robust Speech Recognition. University of California, Los Angeles, 1995.
- Swee L. H., Implementing Speech-Recognition Algorithms on the TMS320C2xx Platform. Texas Instruments Singapore (P&E) Ltd., 1998.
- Tebelskis J., Speech Recognition using Neural Networks, School of Computer Science, Carnegie Mellon University, 1995.
- Tohkura, A Weighted Cepstral Distance Measure For Speech Recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 35, no. 10, pp. 1414−1422, Oct. 1987.
- Zwicker, E., Flottorp, G., and Stevens, S. Critical Band Width in Loudness Summation, J. Acoust. Soc. Am. 29, 548−557, 1957.
- Zwicker, E. On a psychoacoustical equivalent of tuning curves, Facts and
- Models in Hearing (Eds. Zwicker, E., Terhardt, E.), Springer, Berlin, 132 141, 1974.
- Zwicker, E. and Schorn, K. Psychoacoustical tuning curves in audiology, Audiology 17, 120−140, 1978.
- Zwicker, E., Terhardt, E. Analytical expressions for critical-band rate and critical bandwidth as a function of frequency, J. Acoust. Soc. Am. 68, 15 231 525, 1980.