Advanced search
Start date

Purine synthesis pathway and identification of novel genes

Grant number: 00/07439-8
Support Opportunities:Genome Research Grants
Duration: August 01, 2000 - July 31, 2002
Field of knowledge:Biological Sciences - Biochemistry - Molecular Biology
Principal Investigator:Otavio Henrique Thiemann
Grantee:Otavio Henrique Thiemann
Host Institution: Instituto de Física de São Carlos (IFSC). Universidade de São Paulo (USP). São Carlos , SP, Brazil


We are currently involved as a sequencing laboratory in the SUCEST project. Our Data Mining purpose is to search the EST database generated from the SUCEST effort in two themes: the first is the identification of genes involved in the purine synthesis and recycling pathways. The second theme of our data mining effort is an exploratory analysis of those EST sequences whose identity is unknown from the first database searches, the so called “no hit sequence". The purine nucleotide pathways are of central importance to all living organisms and have been investigated as a possible target for chemotherapy due to differences between the disease agents and their hosts. In human cells the purine nucleotides are synthesized from non-nucleotide precursors such as amino acids, ammonia and carbon dioxide. Purines can also be recycled through the salvage pathway. Another important enzyme involved in the salvage as well as de novo pathway is the enzyme responsible for the synthesis of the PRPP substrate, PRPP synthetase (PRS), utilized in all PRTases reactions. The knowledge of sugarcane purine synthesis enzymes will open the possibility of using such enzymes as a target for drugs to combat phytopathogen agents, as is being done with several parasitic targets. With our participation in the project as a sequencing laboratory, we have initiated a preliminary Data Mining effort using the following strategy. Representative enzyme sequences for each member of the purine de novo synthesis and recycling pathways have been chosen from the NCBI database. Those peptide sequences are being used to search the entire translated SUCEST database using the BLAST facility available. Retrieved EST clones are further tested for the statistical significance of the alignment by a Monte Carlo shuffling algorithm. To calibrate the Monte Carlo analysis, known protein sequences of divergent rate along the phylogenetic three have been used. Those sequences are compared to each other and to the EST clones. The resulting table of p-values indicates the degree of divergence of each enzyme along different rate and with the Sugarcane EST clones. Preliminary results employing this strategy allowed us to identify at least one potential case of polymorphism in Sugarcane, of the protein PRPP synthetase, a key enzyme of the purine synthesis pathway. Interestingly, two important genes, glutamine-PRPP-amidotransferase and GAR transformylase, from the de novo synthesis pathway have not been found in the SUCEST database so far. Those missing genes pose interesting questions that may be further investigated, such as if those genes are of such low abundance as to be undetected in the current libraries. Alternatively glutamine-PRPP-amidotransferase and GAR transformylase y be so divergent as to avoid detection in our search strategy. The possibility that Sugarcane would employ an alternative purine metabolism is unlikely since all the other enzymes involved have been identified with high degree of similarity to the known sequences. In every genome effort undertaken to date a variable number of unidentified sequences are encountered. Those genes are of great interest since they may be responsible for important and yet unknown pathways of the organism studied. The EST genome sequencing effort of the sugarcane plant isn't an exception. Several EST sequences are being accumulated as “no hit sequences" that aren't initially identified by the standard method of search employed. Our purpose is to analyze as many as possible of those sequences in an attempt to identify sequences with marginal similarity scores. Although such undertaking is laborious and will not permit the detailed examination of all the unknown sequences, we may be able to identify if potentially valuable information is being lost as "no Hit sequences". If such is the case, it would justify further efforts to develop automatic search methods to explore those sequences. The strategy employed will be of collecting individual sequences and perform database searches, against the public databases, using the translated peptide sequence as query. Such approach is known to increase the sensitivity of the search and return more reliable results. Marginal identities will be further analyzed by statistical methods, such as the Monte Carlo algorithm, and phylogenetic reconstruction if the sequence length permits. The identified sequences will be made available for further study by the data mining laboratory working on the pathway specific to it or will be catalogued for further analysis. (AU)

Articles published in Agência FAPESP Newsletter about the research grant:
Articles published in other media outlets (0 total):
More itemsLess items

Please report errors in scientific publications list using this form.