Machine learning consists of concepts and techniques that enable computers to improve their performance with experience, i.e., enable computers to "learn" from data. Semi-supervised learning is one of the main categories of machine learning, and it consists of data classification inside partially-labeled datasets. Although it has been well studied, semi-supervised learning is a field full of challenges and has many open topics. Collective dynamical systems, in turn, are systems comprised of many individuals, each a dynamical system by itself, so that all individuals act collectively, i.e., the action of each individual is influenced by the action of its neighbors. A remarkable property of these systems is that global patterns may spontaneously arise from the local interactions among the individuals, a phenomenon known as "emergence". Their challenges and relevance are encouraging their research in many branches of science and engineering. At the same time, techniques based on collective dynamical systems are being employed in machine learning tasks, showing to be promising. The objective of this research project is to develop and analyze collective dynamical models for semi-supervised learning. In particular, it is proposed to work on models in which the movement of each object is determined by both the location and the velocity of its neighboring objects. While location captures the geometry of the dataset, velocity allows for the formation of several patterns during the absorption of unlabeled data by the labeled ones. Thus, it is proposed to combine the location model with the velocity model, using the advantages of each one. It is expected that the collective dynamical system modeled in this way converges towards an equilibrium state in which the pattern formed by the data corresponds to the label propagation result (the task of semi-supervised learning). As far as we know, the combination of these two models (location and velocity) is a novel study in machine learning. This project aims also to perform theoretical analysis and numerical simulations on the computational models to be developed. Due to their dynamical nature, it is expected that these models will be robust and able to describe not only the label propagation result, but also the propagation process. In this process, the generated information (values of the system variables) is valuable and, in addition to the propagation of labels, it may reveal features to perform soft labeling, to determine overlapping classes, and even to prevent the spread of wrong labels.
News published in Agência FAPESP Newsletter about the scholarship: