|
Abstract:
|
This thesis is concerned with the analysis of visual images, a field in which computational methods for extracting high-level representations of visual objects are developed. The human visual system is currently orders of magnitude better than any automatic systems that have been developed. Humans are especially adept at the recognition of natural objects, particularly human faces. The recognition of human faces is also important for many applications of image analysis, such as user interfaces, surveillance and face verification systems. The purpose of this thesis is to develop methods for tracking face features using various methods. Specially, Gabor wavelets are used as the low-level visual representation, which is somewhat analogous to what is known of the early stages of human visual processing. Subsequently, different search and matching algorithms are applied to detecting the most salient face features, i.e. the eyes, nose and lips. The work presented here has been carried out as part of the European Union Research Training Network MUHCI (Multimodal Human-Computer Interaction). In addition to the Gabor wavelet representation, dynamic programming, clustering and face geometry constraints are applied. Dynamic programming id used for tracking features over several frames of a video sequence. The computational demands for face feature detection are very high, and for this reason several techniques are used to speed up the calculations, among them clustering, frequency domain filtering and using subsampled image representations. In addition to the knowledge of the local properties of the face features, a global constraint on the geometry is also used. An audiovisual database comprising several people recorded over several weeks was specified, collected and annotated as part of this thesis. In addition, an audiovisual database from the Helsinki University of Technology was used. In total, these databases consist of approximately 6000 face images, and they are used both as training and testing material for the parameters of the face feature detectors. The results of this thesis indicate the difficulty of the face feature recognition task even with relatively constant illumination and frontal face images. Detectors based on simple low-level matching can find the eyes, which are maybe the most prominent features of the face, in approximately 70% of the cases. The lips and nose are correctly detected only in approximately 30%. With postprocessing based on dynamic programming, these percentages increase to 90% and 60%, respectively. However, these results still need to be improved to make face detection useful in most applications. /Kir10 |