Advanced search
Start date
Betweenand

Machine learning tools for bioinformatics problems

Grant number: 19/21300-9
Support Opportunities:Scholarships in Brazil - Doctorate
Effective date (Start): November 01, 2019
Effective date (End): September 30, 2020
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal Investigator:André Carlos Ponce de Leon Ferreira de Carvalho
Grantee:Victor Alexandre Padilha
Host Institution: Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil
Associated research grant:13/07375-0 - CeMEAI - Center for Mathematical Sciences Applied to Industry, AP.CEPID

Abstract

In the recent years, machine learning techniques have been extensively used for bioinformatics, due to their capacity in solving hard problems by learning a function from a set of known examples which is able to make predictions for new and unseen data. Motivated by such results we will tackle in this project three different bioinformatics problems using machine learning techniques: (i) the classification of CRISPR associated (Cas) proteins, by extracting features from a set of sequences of different genomes. We will include the developed tool in a CRISPR system classification pipeline that we have already developed and compare it with Hidden Markov Models, which are the current technique used for labeling Cas proteins in the pipeline; (ii) we will develop a new tool for the identification of translation initiation sites from ribosome-profiling data. Based on a set of labeled data, we will extract peaks that characterize such sites and build a model to predict peaks in novel data; and (iii) we will work on the identification of long non-coding RNAs in plants, by extracting features from whole genome alignments, to make it possible the prediction of conserved protein regions with conserved secondary structure. (AU)

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Scientific publications (4)
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
PADILHA, VICTOR A.; ALKHNBASHI, OMER S.; SHAH, SHIRAZ A.; DE CARVALHO, ANDRE C. P. L. F.; BACKOFEN, ROLF. CRISPRcasIdentifier: Machine learning for accurate identification and classification of CRISPR-Cas systems. GIGASCIENCE, v. 9, n. 6, p. 12-pg., . (13/07375-0, 19/21300-9, 16/18615-0)
PADILHA, VICTOR A.; ALKHNBASHI, OMER S.; SHAH, SHIRAZ A.; DE CARVALHO, ANDRE C. P. L. F.; BACKOFEN, ROLF. CRISPRcasIdentifier: Machine learning for accurate identification and classification of CRISPR-Cas systems. GIGASCIENCE, v. 9, n. 6, . (16/18615-0, 13/07375-0, 19/21300-9)
PADILHA, VICTOR A.; ALKHNBASHI, OMER S.; TRAN, VAN DINH; SHAH, SHIRAZ A.; CARVALHO, ANDRE C. P. L. F.; BACKOFEN, ROLF. Casboundary: automated definition of integral Cas cassettes. Bioinformatics, v. 37, n. 10, p. 1352-1359, . (13/07375-0, 19/21300-9)
ALKHNBASHI, OMER S.; MITROFANOV, ALEXANDER; BONIDIA, ROBSON; RADEN, MARTIN; TRAN, VAN DINH; EGGENHOFER, FLORIAN; SHAH, SHIRAZ A.; OEZTUERK, EKREM; PADILHA, VICTOR A.; SANCHES, DANILO S.; et al. CRISPRloci: comprehensive and accurate annotation of CRISPR-Cas systems. Nucleic Acids Research, v. 49, n. W1, p. W125-W130, . (13/07375-0, 19/21300-9)
Academic Publications
(References retrieved automatically from State of São Paulo Research Institutions)
PADILHA, Victor Alexandre. Machine Learning Tools for Bioinformatics Problems. 2020. Doctoral Thesis - Universidade de São Paulo (USP). Instituto de Ciências Matemáticas e de Computação (ICMC/SB) São Carlos.

Please report errors in scientific publications list using this form.