
Speech Recognition Using Neural Networks Phd Thesis is unique and original. On-time. Delivery. Your paper is delivered well before the deadline. Well Formatted Papers. We do all formats, including MLA, APA, Harvard, etc. Grammar and Spelling Check. All papers double checked for mistakes/10() Speech Recognition Using Neural Networks Phd Thesis literature research paper was due in 5 days. I was sure I was in trouble and would fail my class. There was no way I could do it in time. I contacted and they had a writer on it pronto. I couldn’t believe it!/10() Speech Recognition using Neural Networks Joe Tebelskis May CMU-CS School of Computer Science Carnegie Mellon University Pittsburgh, Pennsylvania Submitted in partial fulfillment of the requirements for a degree of Doctor of Philosophy in Computer Science Thesis Committee: Alex Waibel, chair Raj Reddy Jaime Carbonell
Speech Recognition using Neural Networks – IJERT
School of Computer Science Carnegie Mellon University Pittsburgh, Pennsylvania Submitted in partial fulllment of the requirements for a degree of Doctor of Philosophy in Computer Science. Thesis Committee: Alex Waibel, chair Raj Reddy Jaime Carbonell Richard Lippmann, MIT Lincoln Labs. Copyright Joe Tebelskis This research was supported during separate phases by ATR Interpreting Telephony Research Laboratories, NEC Corporation, Siemens AG, the National Science Foundation, the Advanced Research Projects Administration, and the Department of Defense under Contract No.
The views and conclusions contained in this document are those of the author and should not be interpreted as representing the ofcial policies, either expressed or implied, of ATR, NEC, Siemens, NSF, or the United States Government. Keywords: Speech recognition, neural networks, hidden Markov models, speech recognition using neural networks phd thesis 1995 systems, acoustic modeling, prediction, classication, probability estimation, speech recognition using neural networks phd thesis 1995, discrimination, global optimization.
This thesis examines how articial neural networks can benet a large vocabulary, speaker independent, continuous speech recognition system. Currently, most speech recognition systems are based on hidden Markov models HMMsa statistical framework that supports both acoustic and temporal modeling. Despite their state-of-the-art performance, HMMs make a number of suboptimal modeling assumptions that limit speech recognition using neural networks phd thesis 1995 potential effectiveness.
Neural networks avoid many of these assumptions, while they can also learn complex functions, generalize effectively, tolerate noise, and support parallelism. While neural networks can readily be applied to acoustic modeling, it is not yet clear how they can be used for temporal modeling.
Therefore, we explore a class of systems called NN-HMM hybrids, in which neural networks perform acoustic modeling, and HMMs perform temporal modeling. We argue that a NN-HMM hybrid has several theoretical advantages over a pure HMM system, including better acoustic modeling accuracy, better context sensitivity, more natural discrimination, and a more economical use of parameters. These advantages are confirmed experimentally by a NN-HMM hybrid that we developed, based on context-independent phoneme models, that achieved In the course of developing this system, we explored two different ways to use neural networks for acoustic modeling: prediction and classification.
We found that predictive networks yield poor results because of a lack of discrimination, but classification networks gave excellent results. We veried that, in accordance with theory, the output activations of a classification network form highly accurate estimates of the posterior probabilities P class inputand we showed how these can easily be converted to likelihoods P input class for standard HMM recognition algorithms.
Finally, this thesis reports how we optimized the accuracy of our system with many natural techniques, such as expanding the input window size, normalizing the inputs, increasing the number of hidden units, converting the networks output activations to log likelihoods, optimizing the learning rate schedule by automatic search, backpropagating error from word level outputs, and using gender dependent networks. I wish to thank Alex Waibel for the guidance, encouragement, and friendship that he managed to extend to me during our six years of collaboration over all those inconvenient oceans and for his unflagging efforts to provide a world-class, international research environment, which made this thesis possible.
Alexs scientic integrity, humane idealism, good cheer, and great ambition have earned him my respect, plus a standing invitation to dinner whenever he next passes through my corner of the world. I also wish to thank Raj Reddy, Jaime Carbonell, and Rich Lippmann for serving on my thesis committee and offering their valuable suggestions, both on my thesis proposal and on this final dissertation.
I would also like to thank Scott Fahlman, my first advisor, for channeling my early enthusiasm for neural networks, and teaching me what it means to do good research. Many colleagues around the world have influenced this thesis, including past and present members of the Boltzmann Group, the NNSpeech Group at CMU, and the NNSpeech Group at the University of Karlsruhe in Germany.
I especially want to thank my closest collaborators over these years Bojan Petek, Otto Schmidbauer, Torsten Zeppenfeld, Hermann Hild, Patrick Haffner, Arthur McNair, Tilo Sloboda, Monika Woszczyna, Ivica Rogina, Michael Finke, and Thorsten Schueler for their contributions and their friendship.
I also wish to acknowledge valuable interactions Ive had with many other talented researchers, including Fil Alleva, Uli Bodenhausen, Herve Bourlard, Lin Chase, Mike Cohen, Mark Derthick, speech recognition using neural networks phd thesis 1995, Mike Franzini, Paul Gleichauff, John Hampshire, Nobuo Hataoka, Geoff Hinton, Xuedong Huang, Mei-Yuh Hwang, Ken-ichi Iso, Ajay Jain, Yochai Konig, George Lakoff, Kevin Lang, Chris Lebiere, Kai-Fu Lee, Ester Levin, Stefan Manke, Jay McClelland, Chris Speech recognition using neural networks phd thesis 1995, Abdelhamid Mellouk, Nelson Morgan, speech recognition using neural networks phd thesis 1995, Barak Pearlmutter, Dave Plaut, Dean Pomerleau, Steve Renals, Roni Rosenfeld, Dave Rumelhart, Dave Sanner, Hidefumi Sawai, David Servan-Schreiber, Bernhard Suhm, Sebastian Thrun, Dave Touretzky, Minh Tue Voh, Wayne Ward, Christoph Windheuser, and Michael Witbrock.
I am especially indebted to Yochai Konig at ICSI, who was speech recognition using neural networks phd thesis 1995 generous in helping me to understand and reproduce ICSIs experimental results; and to Arthur McNair for taking over the Janus demos in so that I could focus on my speech research, and for constantly keeping our environment running so smoothly.
Thanks to Hal McCarter and his colleagues at Adaptive Speech recognition using neural networks phd thesis 1995 for their assistance with the CNAPS parallel computer; and to Nigel Goddard at the Pittsburgh Supercomputer Center for help with the Cray C Thanks to Roni Rosenfeld, Lin Chase, and Michael Finke for proofreading portions of this thesis, speech recognition using neural networks phd thesis 1995. I am also grateful to Robert Wilensky for getting me started in Articial Intelligence, and especially to both Douglas Hofstadter and Allen Newell for sharing some treasured, pivotal hours with me.
Many friends helped me maintain my sanity during the PhD program, as I felt myself drowning in this overambitious thesis, speech recognition using neural networks phd thesis 1995. Without the support of my friends, I might not have nished the PhD. I wish to thank my parents, Virginia and Robert Tebelskis, for having raised me in such a stable and loving environment, which has enabled me to come so far.
This thesis is dedicated to Douglas Hofstadter, whose book Godel, Escher, Bach changed my life by suggesting how consciousness can emerge from subsymbolic computation, shaping my deepest beliefs and inspiring me to study Connectionism; and to the late Allen Newell, whose genius, passion, warmth, speech recognition using neural networks phd thesis 1995, speech recognition using neural networks phd thesis 1995 humanity made him a beloved role model whom I could only dream of emulating, and whom I now sorely miss.
iii Acknowledgements. v 1 Introduction. and Hindsight. Appendix A. Final System Design. Proof that Classifier Networks Estimate Posterior Probabilities. Speech is a natural mode of communication for people. We learn all the relevant skills during early childhood, without instruction, and we continue to rely on speech communication throughout our lives. It comes so naturally to us that we dont realize how complex a phenomenon speech is. The human vocal tract and articulators are biological organs with nonlinear properties, whose operation is not just under conscious control but also affected by factors ranging from gender to upbringing to emotional state.
As a result, vocalizations can vary widely in terms of their accent, pronunciation, articulation, roughness, nasality, pitch, volume, and speed; moreover, during transmission, our irregular speech patterns can be further distorted by background noise and echoes, as well as electrical characteristics if telephones or other electronic equipment are used.
All these sources of variability make speech recognition, even more than speech generation, a very complex problem. Yet people are so comfortable with speech that we would also like to interact with our computers via speech, rather than having to resort to primitive interfaces such as keyboards and pointing devices.
A speech interface would support many valuable applications for example, telephone directory assistance, spoken database querying for novice users, handsbusy applications in medicine or fieldwork, office dictation devices, or even automatic voice translation into foreign languages. Such tantalizing applications have motivated research in automatic speech recognition since the s.
Great progress has been made so far, especially since the s, using a series of engineered approaches that include template matching, knowledge engineering, and statistical modeling. Yet computers are still nowhere near the level of human performance at speech recognition, and it appears that further signicant advances will require some new insights. What makes people so good at recognizing speech?
Intriguingly, the human brain is known to be wired differently than a conventional computer; in fact it operates under a radically different computational paradigm. The brains impressive superiority at a wide range of cognitive skills, including speech recognition, has motivated research into its novel computational paradigm since the s, on the assumption that brainlike models may ultimately lead to brainlike performance on many complex tasks.
This fascinating research area is now known as connectionism, or the study of articial neural networks. The history of this eld has been erratic and laced with 1. hyperbolebut by the mids, the eld had matured to a point where it became realistic to begin applying connectionist models to difficult tasks like speech recognition.
By when this thesis was proposedmany researchers had demonstrated the value of neural networks for important subtasks like phoneme recognition and spoken digit recognition, but it was still unclear whether connectionist techniques would scale up to large speech recognition tasks. This thesis demonstrates that neural networks can indeed form the basis for a general purpose speech recognition system, and that neural networks offer some clear advantages over conventional techniques.
Speech Recognition What is the current state of the art in speech recognition? This is a complex question, because a systems accuracy depends on the conditions under which it is evaluated: under sufficiently narrow conditions almost any system can attain human-like accuracy, but its much harder to achieve good accuracy under general conditions.
The conditions of evaluation and hence the accuracy of any system can vary along the following dimensions: Vocabulary size and confusability. As a general rule, it is easy to discriminate among a small set of words, but error rates naturally increase as the vocabulary size grows. On the other hand, even a small vocabulary can be hard to recognize if it contains confusable words, speech recognition using neural networks phd thesis 1995. Speaker dependence vs. By denition, a speaker dependent system is intended for use by a single speaker, but a speaker independent system is intended for use by any speaker.
Speaker independence is difcult to achieve because a systems parameters become tuned to the speaker s that it was trained on, and these parameters tend to be highly speaker-specic. Error rates are typically 3 to 5 times higher for speaker independent systems than for speaker dependent ones Lee Intermediate between speaker dependent and independent systems, there are also multi-speaker systems intended for use by a small group of people, and speaker-adaptive systems which tune themselves to any speaker given a small amount of their speech as enrollment data.
Isolated, discontinuous, or continuous speech. Isolated speech means single words; discontinuous speech means full sentences in which words are articially separated by silence; and continuous speech means naturally spoken sentences. Isolated and discontinuous speech recognition is relatively easy because word boundaries are detectable and the words tend to be cleanly pronounced.
ous speech is more difcult, however, because word boundaries are unclear and their pronunciations are more corrupted by coarticulation, or the slurring of speech sounds, which for example causes a phrase like could you to sound like could jou. Task and language constraints. Even with a xed vocabulary, performance will vary with the speech recognition using neural networks phd thesis 1995 of constraints on the word sequences that are allowed during recognition.
Some constraints may be task-dependent for example, an airlinequerying application may dismiss the hypothesis The apple is red ; other constraints may be semantic rejecting The apple is angryor syntactic rejecting Red is apple the. Constraints are often represented by a grammar, which ideally lters out unreasonable sentences so that the speech recognizer evaluates only plausible sentences.
Grammars are usually rated by their perplexity, a number that indicates the grammars average branching factor i. The difculty of a task is more reliably measured by its perplexity than by its vocabulary size. Read vs. spontaneous speech. Systems can be evaluated on speech that is either read from prepared scripts, or speech that is uttered spontaneously. Spontaneous speech is vastly more difcult, because it tends to be peppered with disuencies like uh and um, false starts, incomplete sentences, stuttering, coughing, and laughter; and moreover, the vocabulary is essentially unlimited, so the system must be able to deal intelligently with unknown words e.
Adverse conditions. A systems performance can also be degraded by a range of adverse conditions Furui These include environmental noise e. g, echoes, room acoustics ; different microphones e.
In order to evaluate and compare different systems under well-defined conditions, a number of standardized databases have been created with particular speech recognition using neural networks phd thesis 1995. For example, one database that has been widely used is the DARPA Resource Management database a large vocabulary wordsspeaker-independent, continuous speech database, consisting of training sentences in the domain of naval resource management, read from a script and recorded under benign environmental conditions; testing is usually performed using a grammar with a perplexity of We used this database, as well as two smaller ones, in our own research see Chapter 5.
The central issue in speech recognition is dealing with variability. Currently, speech recognition systems distinguish between two kinds of variability: acoustic and temporal. Acoustic variability covers different accents, pronunciations, pitches, volumes, and so on. while temporal variability covers different speaking rates.
These two dimensions are not completely independent when a person speaks quickly, his acoustical patterns become distorted as well speech recognition using neural networks phd thesis 1995 its a useful simplication to treat them independently. Of these two dimensions, temporal variability is easier to handle. An early approach to temporal variability was to linearly stretch or shrink warp an unknown utterance to the duration of a known template, speech recognition using neural networks phd thesis 1995.
Linear warping proved inadequate, however, because utterances can accelerate or decelerate at any time; instead, nonlinear warping was obviously required. Soon an efficient algorithm known as Dynamic Time Warping was proposed as a solution to this problem. This algorithm in some form is now used in virtually every speech recognition system, and the problem of temporal variability is considered to be largely solved1.
Stanford Seminar - Deep Learning in Speech Recognition
, time: 1:13:04Kean University | Contact Directory

Speech Recognition Using Neural Networks Phd Thesis literature research paper was due in 5 days. I was sure I was in trouble and would fail my class. There was no way I could do it in time. I contacted and they had a writer on it pronto. I couldn’t believe it!/10() Speech Recognition Using Neural Networks Phd Thesis is unique and original. On-time. Delivery. Your paper is delivered well before the deadline. Well Formatted Papers. We do all formats, including MLA, APA, Harvard, etc. Grammar and Spelling Check. All papers double checked for mistakes/10() Over the years, our writing service has gained an excellent reputation Speech Recognition Using Neural Networks Phd Thesis for its contribution in students’ academic success. Today, thanks to our Speech Recognition Using Neural Networks Phd Thesis /10()
No comments:
Post a Comment