In the context of machine learning, representing a dataset by graphs have been studied in the literature, especially in the semi supervised learning area. The principal feature of the techniques based on graphs (networks) is in the way data is represented in which network vertices represent the data and the edges represent the distances/similarities (relations) between the examples. Among the main advantages of these techniques can include: representation of the topological structure of the data (classes of arbitrary shapes); relational data; representation of multiple classes; among others. In the context of building graphs for representing semi supervised machine learning problems, different similarity functions (or distance) are used, such as Euclidean, Mahalanobis, Hausdorff, among others, all developed manually by humans. One of the sub areas of evolutionary algorithms, the Grammatical Evolution (GE) has emerged as a proper technique to develop mathematical functions. An evolved function automatically can not only produce the same solution developed by a human to solve a particular problem, but is also able to produce something entirely new and possibly even better. In this context, the intention of this project is to automatically build similarity measures for use in the construction of graphs to represent datasets in the semi supervised learning context. This automatic construction will be possible through the use of Grammatical Evolution.
News published in Agência FAPESP Newsletter about the scholarship: