Advanced search
Start date

Molecular property prediction with high accuracy: a semi-supervised learning approach

Grant number: 21/08852-2
Support Opportunities:Scholarships in Brazil - Doctorate
Effective date (Start): December 01, 2021
Effective date (End): February 28, 2025
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal Investigator:Marcos Gonçalves Quiles
Grantee:Gabriel Augusto Lins Leal Pinheiro
Host Institution: Instituto de Ciência e Tecnologia (ICT). Universidade Federal de São Paulo (UNIFESP). Campus São José dos Campos. São José dos Campos , SP, Brazil
Host Company:Universidade de São Paulo (USP). Instituto de Química de São Carlos (IQSC)
Associated research grant:17/11631-2 - CINE: computational materials design based on atomistic simulations, meso-scale, multi-physics, and artificial intelligence for energy applications, AP.PCPE
Associated scholarship(s):22/13536-5 - Polymer featurization strategies for machine learning tasks in the small-data regime, BE.EP.DR


Materials discovery seeks to generate tailored materials with specific physicochemical properties. This study relies on the field of Material Science, which uses computer simulation methods and laboratory experiments to describe the behavior of the material to different environments and other materials. Nevertheless, computer simulation methods used to investigate such material properties are computationally costly, and laboratory experiments require suitable environmental conditions to operate such experiments. In contrast, the amount of computer simulation and laboratory experiments performed in the last decade motivated the application of Machine Learning (ML) algorithms as an alternative approach to compute molecular properties, due to the time required by ML to predict a new instance. In the supervised context, ML models, trained on Density Functional Theory (DFT) data, showed minor errors when compared to the DFT errors concerning laboratory experiments, and, therefore, prediction performances above the limit established as chemical accuracy. In this context, state-of-the-art models use the nucleus coordinates and graph convolutional neural networks on the molecular graph. However, in ML applications, unlabeled data has been explored in a few problems, even though unlabeled data sets are much larger than the supervised data. Recently, contrastive models showed state-of-the-art results in semi-supervised applications and as well the capability to learn useful representations. In this sense, this work aims to propose a contrastive model for application in Materials Science. For this, we will investigate several neural network encoders, data augmentation techniques, as well as to propose a loss function that considers both the property prediction error and the contrastive loss. (AU)

News published in Agência FAPESP Newsletter about the scholarship:
Articles published in other media outlets (0 total):
More itemsLess items

Scientific publications
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
PINHEIRO, GABRIEL A.; SILVA, JUAREZ L. F.; QUILES, MARCOS G.. SMICLR: Contrastive Learning on Multiple Molecular Representations for Semisupervised and Unsupervised Representation Learning. JOURNAL OF CHEMICAL INFORMATION AND MODELING, v. 62, n. 17, p. 13-pg., . (18/21401-7, 17/11631-2, 21/08852-2)
PINHEIRO, GABRIEL A.; CALDERAN, FELIPE V.; DA SILVA, JUAREZ L. F.; QUILES, MARCOS G.; WANI, MA; KANTARDZIC, M; PALADE, V; NEAGU, D; YANG, L; CHAN, KY. The impact of low-cost molecular geometry optimization in property prediction via graph neural network. 2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, v. N/A, p. 6-pg., . (21/08852-2, 17/11631-2, 18/21401-7)

Please report errors in scientific publications list using this form.