Clustering Algorithms for High-Dimensional Data
Permanent address of the item is
More and more data are produced every day. Some clustering techniques have been developed to automatically process this data, however, when this data is characteristically high-dimensional, conventional algorithms do not perform well. In this thesis, problems related to the curse of the dimensionality are discussed, as well as some algorithms to approach the problem. Finally, some empirical tests have been run to check the behavior of such approaches. Most algorithms do not really cope well with high-dimensional data. DBSCAN, some of its derivations, and surprisingly k-means, seem to be the best approaches.