Novas técnicas de amostragem tendenciosa para os algoritmos de análise de agrupamento k-médias e DBSCAN

Nenhuma Miniatura disponível
Data
2019-03-28
Autores
Luchi, Diego
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal do Espírito Santo
Resumo
The cluster analysis is a set of techniques designed to identify groups of similar elements in a dataset. Such techniques are used in many different applica tions, such as image segmentation, signal processing, data compression, unsuper vised learning, selection of characteristics, sampling, among others. Although they are important in a wide range of applications, the use of these techniques in large cardinality data is a problem due to the poor scalability of several traditional al gorithms. One way to circumvent this problem is to sample, after all, reducing the cardinality of data sets greatly reduces the computational effort required by the methods. This thesis presents three new sampling methods specifically designed to be used in conjunction with the cluster analysis algorithms k-means and DBSCAN. The experimental results show that those designed for the DBSCAN algorithm obtained better results than the competitors. However, the proposed sampling ap proach for k-means returned lower quality results than DENDIS, a recently proposed method.
Descrição
Palavras-chave
Amostragem , Unsupervised learning , Cluster analysis , Sampling , Aprendizado não supervisionado , Análise de agrupamento
Citação