Advanced search
Start date

Molecular property prediction with high accuracy: a semi-supervised learning approach

Grant number: 21/08852-2
Support Opportunities:Scholarships in Brazil - Doctorate
Effective date (Start): December 01, 2021
Effective date (End): February 29, 2024
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal Investigator:Marcos Gonçalves Quiles
Grantee:Gabriel Augusto Lins Leal Pinheiro
Host Institution: Instituto de Ciência e Tecnologia (ICT). Universidade Federal de São Paulo (UNIFESP). Campus São José dos Campos. São José dos Campos , SP, Brazil
Host Company:Universidade de São Paulo (USP). Instituto de Química de São Carlos (IQSC)
Associated research grant:17/11631-2 - Computational material science and chemistry, AP.PCPE


Materials discovery seeks to generate tailored materials with specific physicochemical properties. This study relies on the field of Material Science, which uses computer simulation methods and laboratory experiments to describe the behavior of the material to different environments and other materials. Nevertheless, computer simulation methods used to investigate such material properties are computationally costly, and laboratory experiments require suitable environmental conditions to operate such experiments. In contrast, the amount of computer simulation and laboratory experiments performed in the last decade motivated the application of Machine Learning (ML) algorithms as an alternative approach to compute molecular properties, due to the time required by ML to predict a new instance. In the supervised context, ML models, trained on Density Functional Theory (DFT) data, showed minor errors when compared to the DFT errors concerning laboratory experiments, and, therefore, prediction performances above the limit established as chemical accuracy. In this context, state-of-the-art models use the nucleus coordinates and graph convolutional neural networks on the molecular graph. However, in ML applications, unlabeled data has been explored in a few problems, even though unlabeled data sets are much larger than the supervised data. Recently, contrastive models showed state-of-the-art results in semi-supervised applications and as well the capability to learn useful representations. In this sense, this work aims to propose a contrastive model for application in Materials Science. For this, we will investigate several neural network encoders, data augmentation techniques, as well as to propose a loss function that considers both the property prediction error and the contrastive loss. (AU)

News published in Agência FAPESP Newsletter about the scholarship:
Articles published in other media outlets (0 total):
More itemsLess items

Please report errors in scientific publications list by writing to: