Advanced search
Start date
Betweenand

Genomic genetic value prediction using machine learning and SNP subset

Grant number: 24/09391-7
Support Opportunities:Scholarships in Brazil - Post-Doctoral
Effective date (Start): August 01, 2024
Effective date (End): June 30, 2025
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal Investigator:João Paulo Papa
Grantee:Thomaz Marques Sena
Host Institution: Faculdade de Ciências (FC). Universidade Estadual Paulista (UNESP). Campus de Bauru. Bauru , SP, Brazil
Associated research grant:13/07375-0 - CeMEAI - Center for Mathematical Sciences Applied to Industry, AP.CEPID

Abstract

Genetic evaluations of 305 milk yield (MY305) in Gir cattle uses the animal model to estimate the genomic genetic value (GEBV), solved by the ssGBLUP method considering genomic relationship information, calculated from genetic single nucleotide polymorphisms (SNP), additive genetic relationship, combined in matrix H, and the productive performance of the entire population. Due to the high costs of genetic sequencing and the low influence of many SNP in high-density panels, researchers suggest creating low-density genotyping panels. To this end, machine learning (ML) methods are investigated to classify SNP according to their relevance to the trait. Therefore, to better understand the genetic architecture of MY305, the proposal aims to employ ML Random Forest, XGBoost and Neural Network algorithms, using the ranger, xgboost and h2o R packages, respectively. The animals will be divided into training and validation groups according to generations, simulating genetic improvement programs and randomly, maintaining the same proportion of animals. The first ten generations comprise training, and the remaining five comprise validation. In both scenarios, fixed effects of the environment that influence the characteristic will be tested. The response variable is MY305, and the predictors are SNP and fixed effects. The 4000 most important SNP from each scenario for MY305 will be used for GEBV prediction analysis using the ssGBLUP method, including fixed effects and with the method's default parameters. Can also check the presence of genes within a 400Mb window of the 5 most important SNP. The existence of genes can be verified using the biomaRt package of the R software in version 110 of Ensembl Genomes. The clusterProfiler package will be used for gene functional enrichment analyses, together with the org.Bt.eg.db R package, which contains the Bos taurus taurus annotation database. The expectation is that these analyzes will identify few SNP with a positive or negative effect on the trait, with the majority showing no effect. The small subset of SNP used to predict GEBV should not affect model accuracy, as the effective population size is relatively small. (AU)

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Please report errors in scientific publications list using this form.