Advanced search
Start date

Population differentiation at genes under strong balancing selection: a case study on the HLA genes

Full text
Débora Yoshihara Caldeira Brandt
Total Authors: 1
Document type: Master's Dissertation
Press: São Paulo.
Institution: Universidade de São Paulo (USP). Instituto de Biociências (IBIOC/SB)
Defense date:
Examining board members:
Diogo Meyer; Carlos Eduardo Guerra Schrago; Maria Dulcetti Vibranovski
Advisor: Diogo Meyer

Balancing selection is defined as any kind of selective regime that increases genetic variability in populations relative to what is expected under neutrality. Theory predicts that balancing selection reduces population differentiation. However, balancing selection regimes in which different sets of alleles are maintained in different populations could increase population differentiation. To better understand the effects of balancing selection on the distribution of genetic variation among populations, we investigated population differentiation at the Human Leukocyte Antigen (HLA) genes, which are the most polymorphic genes in the human genome, and constitute the most striking example of balancing selection in humans. The HLA molecules are responsible for the presentation of peptides to T cells, thus mediating a critical step of the immune response. The advantage of maintaining variation in those genes through balancing selection is possibly related to the increased ability of the immune system to respond to a wider variety of pathogens. In this study, we analysed the public dataset of the 1000 Genomes project (1000G), which sequenced 1092 individuals from different populations using Next Generation Sequencing (NGS) technologies. These sequencing techniques are known to be problematic when applied to highly polymorphic genomic regions, such as the HLA genes. Therefore, we evaluated the reliability of genotype calls and allele frequency estimates of the SNPs reported by 1000G at HLA genes, using Sanger sequencing data of 930 of the 1092 1000G samples as a gold standard. We found a bias towards overestimation of reference allele frequency for some single nucleotide polymorphisms (SNPs), indicating mapping bias is an important cause of error in frequency estimation in the 1000G data. These results provide insights into the challenges of using of NGS data at other genomic regions of high diversity. Using the results of this analysis, we selected a list of sites that have reliable allele frequency estimates in the 1000G data to be used in our population differentiation study. In another methodological control, we demonstrate the effect of using a dataset rich in rare variants in population differentiation studies. Controlling for this effect, and using only the sites which we demonstrated that were reliable, we found that population differentiation of single nucleotide polymorphisms (SNPs) at the HLA genes is lower than that of SNPs in other genomic regions. This suggests a predominant role of global selective pressures in shaping the distribution of variation at the HLA genes among populations. However, we also show evidence that population differentiation of HLA haplotypes may be higher than what we observe at the SNP level, suggesting that local selective pressures may influence the distribution of haplotypes among populations. Altogether, our results indicate that it is possible to reconcile low population differentiation at the SNP level - as predicted by theory - to higher differentiation at haplotypes, which are possibly under local selective pressures. (AU)

FAPESP's process: 12/22796-9 - Population differentiation on genes under strong balancing selection: a case study on the HLA genes
Grantee:Débora Yoshihara Caldeira Brandt
Support Opportunities: Scholarships in Brazil - Master