Much unlabeled data is available on the Web, such as documents, images and videos; data is captured by sensors and satellites and the growth in the use of IoTs (internet of things). However, labeling such data can be costly and involve human resources, such as specialists in an area. In this context, semi-supervised learning has attracted the interest of researchers, as it employs a small amount of labeled data along with a large amount of unlabeled data. However, the existing algorithms do not optimize the data selection process for better labeling. Usually, some examples are selected at random, which are not representative of the underlying data distribution. In this work, our objective is to analyze and propose new methods for the selection of labeled data in the semi-supervised context through active learning.
News published in Agência FAPESP Newsletter about the scholarship: