Advanced search
Start date
Betweenand

Prediction of insertion sites of mobile elements by machine learning: a case study on Microviridae phages and casposons

Grant number: 23/12164-0
Support Opportunities:Scholarships in Brazil - Scientific Initiation
Effective date (Start): March 01, 2024
Effective date (End): February 28, 2025
Field of knowledge:Biological Sciences - Microbiology - Biology and Physiology of Microorganisms
Principal Investigator:Arthur Gruber
Grantee:Giuliana Lopes Pola
Host Institution: Instituto de Ciências Biomédicas (ICB). Universidade de São Paulo (USP). São Paulo , SP, Brazil

Abstract

The Microviridae family comprises viruses with icosahedral capsids and circular ssDNA genomes that infect a variety of host bacteria. Except for the Alpavirinae subfamily, prophages and their insertion sites in host genomes have not been described. Recently, our group carried out a survey of Microviridae prophages using a collection of more than 550,000 bacterial genomes from the PATRIC database. All prophages found were functionally annotated and the insertion sites in the host genomes determined with bioinformatics tools developed by our group. Similarly, we also surveyed casposons, self-synthesizing transposable elements present in bacteria and archaea. In this project, we aim at developing a methodology for predicting the insertion sites of mobile genetic elements, initially using Microviridae prophages and later extending to casposon elements. Microviridae prophage and casposon validated datasets will be used for training purposes. Negative sequences will be generated from the positive sequence data by randomly redistributing the nucleotides of these sequences. In a second approach, random stretches of the same size as the positive sequences, derived from the same organisms, will also be used as negative sequences. Features will be selected by a feature selection algorithm and used with different classifiers including SVM (Support Vector Machine), Random Forest and Multilayer Perceptron. Assessment of classification quality will be performed by cross-validation with ten subsets, using accuracy, precision, revocation, and F1 measure.

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Please report errors in scientific publications list using this form.