Advanced search
Start date

Chemical space exploration via semi-supervised learning for design of new materials


The discovery of new materials is directly linked to the evolution of society. These materials can allow the generation of new drugs to the development of electronic components for clean energy generation. It is noteworthy that in addition to the various materials already available in nature, a multitude of compounds can be theoretically generated from the combination of ordinary chemical elements. However, this space of possibility, called chemical space, is practically infinite, making a thorough scrutiny of all the possibilities unfeasible. To facilitate the search for new materials, scientists have used various machine learning (ML) techniques. In the direct process, ML techniques can be trained and used to predict specific properties of new materials. On the other hand, ML techniques can also be used in the inverse design process, in which the model learns to generate new compounds from desired properties. Among the various ML techniques available in the literature, generative models based on autoencoders have shown promising results. Recently, we proposed a generative model called Supervised Grammatical Variational Autoencoder (SGVAE). This model can perform the two tasks described above: property prediction and molecule design. However, this model, like others in the literature, has limitations and use restrictions, such as a) most models are intrinsically supervised; b) lack a broad study on molecular representations; c) generation of latent spaces with low navigability (sampling) and interpretation; d) lack of a methodology for continuous adaptation of the model in scenarios in which new data are constantly added to the database; and e) validation of models in real scenarios. In this sense, to answer some of these questions, new models based on Variational Autoencoders (VAE) will be studied and developed to generate materials considering multiple representations. A semi-supervised approach will be considered to train the models, in which the data are partially labeled. Moreover, active learning techniques will also be considered to enhance the usage of labeled data and the continuous exploration of the chemical space. To improve the chemical/physical interpretation of the learned latent representation, a qualitative and quantitative analysis of the VAEs will be performed. The models will be evaluated using public datasets and data generated in the context of CINE (Center for Innovation on New Energies). Finally, it is worth noting that this project is part of CINE's computational division (4), where the proponent is one of the principal researchers (Proc. 2017/11631-2). (AU)

Articles published in Agência FAPESP Newsletter about the research grant:
Articles published in other media outlets (0 total):
More itemsLess items

Please report errors in scientific publications list using this form.