Unsupervised Anomaly Detection in Unstructured Log-Data for Root-Cause-Analysis
Permanent address of the item is
Anomaly detection has attracted the attention of researchers from a variety of backgrounds as it finds numerous applications in the industry. As a subfield, fault detection plays a crucial role in growing telecommunications networks since failures lead to dissatisfaction and hence financial drawbacks. It aims at identifying unusual events in the system log files. System logs are messages from the elements of the network to highlight their status. The main challenge is to cope with the rate the data volume grows. Traditional methods such as expert systems are no longer practical making machine learning approaches more valuable. In this thesis work, unsupervised anomaly (fault) detection in unstructured system logs is investigated. The effect of various feature extraction methods are investigated in terms of the gain they provide. Also, the baseline dimensionality reduction method Principal Component Analysis (PCA) and its effects are given. Additionally, autoencoders are studied as an alternative dimensionality reduction technique. Four different methods based on statistics and clustering as well as a framework to clean datasets from anomalies are discussed. A high detection (classification) rate with 99:69% precision and 0:07% false alarm rate are achieved in one of the datasets while similar results have been achieved with variations in the recall in the other dataset. The studies show that the dimensionality reduction can greatly improve the performance of the classifiers used and reduce the computational complexity in anomaly detection.