TextToPhoneme Mapping Using Neural Networks
Abstract:
In this thesis, the problem of texttophoneme mapping using neural networks is studied. One of the main goals of the thesis is to provide a comprehensive analysis of different neural network structures which can be implemented to convert a written text into its corresponding phonetic transcription. Another important target, of this work, is to provide new solutions that improve the performance of the existing algorithms, in terms of convergence speed and phoneme accuracy. Three main neural network classes are studied in this thesis: the multilayer perceptron (MLP) neural network, the recurrent neural network (RNN) and the bidirectional recurrent neural network (BRNN).
Due to their ability of self adaptation, neural networks have been shown to be a viable solution in applications that require modeling abilities. Such an application is the texttophoneme mapping where the correspondence between letters of a written text and their corresponding phonetic transcription must be modeled.
One of the main concerns in all practical implementations, where neural networks are used, is to develop algorithms which provide fast convergence of the synaptic weights and in the same time good mapping performances. When a neural network is trained for texttophoneme mapping, at every iteration, a letterphoneme pair is presented to the network such that, the number of letters and the number of training iterations are equal. As a result, fast convergence of the neural network means smaller size of the training dictionary since fast convergence is in fact similar to less necessary training letters1. A fast convergence speed is important in applications where only a small linguistic database is available. Of course, one solution could be to use a small dictionary (with very few words) which is presented at the input of the neural network many times until the convergence of the synaptic weights is reached. In this case the time of training becomes more important. Taking into account these two sides of the convergence speed (the size of the training dictionary and the processing time during training) one can understand the importance of having algorithms that ensure fast convergence of the neural network.
It is well known that the error backpropagation algorithm which is used to train the MLP neural network, possess sometimes a quite slow convergence (a very large number of iterations required to reach the stability point). In order to increase the convergence speed two novel alternative solutions are proposed in this thesis: one using an adaptive learning rate in the training process and another which is a transform domain implementation of the multilayer perceptron neural network. The computational complexity of the two proposed training algorithms is slightly higher than the computational complexity of the error backpropagation algorithm but the number of training iterations is highly reduced. Due to this fact, although the three algorithms might have the same training time, the novel algorithms necessitate smaller training dictionary.
Due to the limitations of the processing power that usually are encountered in real devices, another very important requirement for a texttophoneme mapping system is to have low computational and memory costs. In the case of texttophoneme mapping systems based in neural networks, the computational complexity is mainly linked to the mathematical complexity of the training algorithm as well as to the number of the synaptic weights of the neural network. Memory load is due to the number of synaptic weights of the neural network which must be stored.
Taking into account all these limitations and implementation requirements, in this thesis, several neural network structures with different number of synaptic weights and trained with various training algorithms, are studied. The modeling capability of the neural networks is addressed, which is translated in the texttophoneme mapping case into the phoneme accuracy. Different neural network structures, training algorithms and network complexities are analyzed also from this point of view. As a remark here, we mention that input letter encoding plays a very important role in the phoneme accuracy of the graphemetophoneme conversion system. This is why special attention has been paid to the comparative analysis of the performances (in terms of phoneme accuracy) obtained with several orthogonal and nonorthogonal encoding of the input letters.
The thesis is structured into four main parts. Chapter 1 brings the reader into the world of texttophoneme mapping. In Chapter 2 several different neural network structures and their corresponding training algorithms are described and two new training algorithms are introduced and analyzed. In Chapter 3 the experimental results, for the problem of monolingual texttophoneme mapping, obtained with the neural networks described in Chapter 2 are shown. Chapter 4 is dedicated to the problem of bilingual graphemetophoneme conversion and Chapter 5 concludes the thesis.
This item appears in the following Collection(s)
Search TUT DPub
Browse

All of TUT DPub

This Collection