Advanced search
Start date

Investigation of fast parallel KNNG for multiple density estimation

Grant number: 23/00993-1
Support Opportunities:Scholarships abroad - Research Internship - Scientific Initiation
Effective date (Start): March 30, 2023
Effective date (End): July 29, 2023
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computer Systems
Principal Investigator:Murilo Coelho Naldi
Grantee:Gabriel Meirelles Carvalho Orlando
Supervisor: Joerg Sander
Host Institution: Centro de Ciências Exatas e de Tecnologia (CCET). Universidade Federal de São Carlos (UFSCAR). São Carlos , SP, Brazil
Research place: University of Alberta, Canada  
Associated to the scholarship:22/04934-7 - Scalable distributed algorithms for approximating the kNNG, BP.IC


Clustering is an unsupervised machine-learning task essential to several applications. In this way, some algorithms need the k Nearest Neighbors Graph (k NNG) as part of clustering. However, finding the k NNG may be a computationally costly operation since calculating all k neighbors of all points in the set is needed, resulting in a quadratic complexity concerning the size of a data set. Moreover, due to the physical limitation in the execution of these algorithms, if large data sets are used, the algorithm tends to become unfeasible. Therefore, finding the approximate k NNG in a less computationally costly way can speed up clustering through parallelism and data distribution, making the process more efficient. In this con-text, a famous algorithm in clustering is Hierarchical Density-Based Spatial Clustering of Applications with Noise - (HDBSCAN) builds the k NNG over the space of mutual reach-ability density. However, for massive datasets, HDBSCAN becomes more computationally inefficient. Therefore, it is essential to build the computationally efficient k NNG needed to perform the HDBSCAN in a distributed and parallel manner, even if it needs to be approximated. This project aims at enhancing HDBSCAN computational performance through approximated k NNG methods using parallelism and data distribution, although securing a minimal quality loss. (AU)

News published in Agência FAPESP Newsletter about the scholarship:
Articles published in other media outlets (0 total):
More itemsLess items

Please report errors in scientific publications list by writing to: