Supervised fault detection using unstructured server-log data to support root cause analysis
Abbaszadeh, Zahra Jr
Permanent address of the item is
Fault detection is one of the most important aspects of telecommunication networks. Considering the growing scale and complexity of communication networks, maintenance and debugging have become extremely complicated and expensive. In complex systems, a higher rate of failure, due to the large number of components, has increased the importance of both fault detection and root cause analysis. Fault detection for communication networks is based on analyzing system logs from servers or different components in a network in order to determine if there is any unusual activity. However, detecting and diagnosing problems in such huge systems are challenging tasks for human, since the amount of information, which needs to be processed goes far beyond the level that can be handled manually. Therefore, there is an immense demand for automatic processing of datasets to extract the relevant data needed for detecting anomalies. In a Big Data world, using machine learning techniques to analyze log data automatically becomes more and more popular. Machine learning based fault detection does not require any prior knowledge about the types of problems and does not rely on explicit programming (such as rule-based). Machine learning has the ability to improve its performance automatically through learning from experience. In this thesis, we investigate supervised machine learning approaches to detect known faults from unstructured log data as a fast and efficient approach. As the aim is to identify abnormal cases against normal ones, anomaly detection is considered to be a binary classification. For extracting numerical features from event logs as a primary step in any classification, we used windowing along with bag-of-words approaches considering their textual characteristics (high dimension and sparseness). We focus on linear classification methods such as single layer perceptron and Support Vector Machines as promising candidate methods for supervised fault detection based on the textual characteristics of network-based server-log data. In order to generate an appropriate approach generalizing for detecting known faults, two important factors are investigated, namely the size of datasets and the time duration of faults. By investigating the experimental results concerning these two aforementioned factors, a two-layer classification is proposed to overcome the windowing and feature extraction challenges for long lasting faults. The thesis proposes a novel approach for collecting feature vectors for two layers of a two-layer classification. In the first layer we attempt to detect the starting line of each fault repetition as well as the fault duration. The obtained models from the first layer are used to create feature vectors for the second layer. In order to evaluate the learning algorithms and select the best detection model, cross validation and F-scores are used in this thesis because traditional metrics such as accuracy and error rates are not well suited for imbalanced datasets. The experimental results show that the proposed SVM classifier provides the best performance independent of fault duration, while factors such as labelling rule and reduction of the feature space have no significant effect on the performance. In addition, the results show that the two-layer classification system can improve the performance of fault detection; however, a more suited approach for collecting feature vectors with smaller time span needs to be further investigated.