Advanced search
Start date

Computational biology for high-through put data analysis

Full text
Daniele Yumi Sunaga de Oliveira
Total Authors: 1
Document type: Doctoral Thesis
Press: São Paulo.
Institution: Universidade de São Paulo (USP). Instituto de Biociências (IBIOC/SB)
Defense date:
Examining board members:
Maria Rita dos Santos e Passos Bueno; Ronaldo Fumio Hashimoto; Júlio Cesar Nievola; Eduardo Moraes Rego Reis
Advisor: Maria Rita dos Santos e Passos Bueno; Ronaldo Fumio Hashimoto

The large amount of data generated by modern technologies of biology provides a big challenge for areas such as bioinformatics. In order to analyze these data there are several computer programs available; however these are not always well understood enough to be correctly applied. Moreover, there are problems that require the development of new solutions. In this work, we present the data analysis of two main high-throughput data sources: microarrays and sequencing. Firstly, we evaluated whether the statistic of Rank Products method (RP) is suitable for the identification of differentially expressed genes in studies of complex diseases, which are characterized by the vast genetic heterogeneity among the individuals affected. Secondly, we developed a tool named hunT to search for target genes of T transcription factor - an important mesodermal marker that plays a key role in the vertebrate development -, by identifying binding sites for T in their regulatory sequences. The RP performance was tested using both simulated and real data from three different studies: non-syndromic cleft lip and palate, autism and sleep deprivation effect in Humans. Our results have shown that RP is an effective solution for the identification of consistently deregulated genes in a subgroup of patients, this ability is maintained even with few samples, however its performance is impaired when only few genes are analyzed. We have obtained strong biological of effectiveness of the method in the studies with real data by not only identifying genes and pathways previously associated with diseases but also corroborating the behavior of novel candidate genes with the real-time PCR technique. The hunT program has identified 4,602 mouse genes containing the binding site for the T domain, some of which have already been demonstrated experimentally. We identified 32 of these genes with altered expression in a study which evaluated the transcriptome of in vitro differentiation of mouse embryonic stem cells to mesoderm, suggesting the involvement of these genes in this process regulated by T (AU)

FAPESP's process: 08/10839-0 - Bioinformatics to identify the signaling pathways important to cell pluripotency
Grantee:Daniele Yumi Sunaga de Oliveira
Support type: Scholarships in Brazil - Doctorate