Advanced search
Start date
Betweenand

Recurrent neural networks for classification of proteins into families and comparison with variable memory Markov chains

Grant number: 22/10583-2
Support Opportunities:Scholarships in Brazil - Scientific Initiation
Effective date (Start): September 01, 2022
Effective date (End): August 31, 2023
Field of knowledge:Physical Sciences and Mathematics - Probability and Statistics - Applied Probability and Statistics
Principal Investigator:Florencia Graciela Leonardi
Grantee:Alexandre Felix da Silva
Host Institution: Instituto de Matemática e Estatística (IME). Universidade de São Paulo (USP). São Paulo , SP, Brazil
Associated research grant:17/10555-0 - Stochastic modeling of interacting systems, AP.TEM

Abstract

Neural network models have become one of the most promising statistical models for complex data analysis. The usual neural networks can be used in classification or regression problems, depending on the nature of the data. The usual models of neural networks assume that the analyzed data are independent. In turn, recurrent neural network models allow the analysis of data with dependency, as is the case of applications for text analysis or analysis of genomic or amino acid sequences. The problem of classifying proteins into families is a classic problem in Bioinformatics. Proteins are made up of one or more sequences of amino acids, of which there are 20 different types. The structure and function of each protein are determined by the types of amino acids used in its composition. Understanding the relationship between amino acid sequence and protein function is a long-standing problem in molecular biology with far-reaching scientific implications. Some of the most commonly used methods for classifying amino acid sequences into families have been Markov chains, hidden Markov chains, and variable memory Markov chains. But very recently, neural network models have also been used to classify proteins into families. In this scientific initiation activities plan, we propose to study the recurrent neural network model as a possible model to classify proteins in families. For this, the fundamental bibliography of the area will be studied and the variable memory Markov chain model will also be studied. Algorithms in the R language will be implemented to classify protein sequences from the Pfam v.34.0 databases. This data is publicly available at http://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam34.0/. The results will be compared with those obtained with variable memory Markov chains.(AU)

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Please report errors in scientific publications list using this form.