Sorghum (Sorghum bicolor L. Moench spp.) is a bioenergy crop that contains several appealing features for exploration in breeding to increase the efficiency of bioenergy production. In genomic selection, high-throughput phenotyping (HTP) data collected over time have a temporal correlation structure still not addressed by genetic models. Dynamic Bayesian Networks (DBN) is a modeling framework suitable for learning temporal correlation structures. DBNs connects the relationship between time points using variables represented by nodes, and arrows to relate them. In this proposal we aim to develop a DBN to exploit the temporal genetic correlation unleashed by the expression of common genes over time for prediction of biomass before harvest. Phenotypic data of several traits will be collected by visual, thermal, multi-spectral reflectance, and other electronic sensors integrated in a robotic platform under development called TERRA-MAP. The main target trait phenotyped will be biomass, and its measure will be refined using other correlated traits. We expect to collect ~180 millions phenotypic data points after harvest. Genomic data will be obtained by Genotyping-by-sequencing in a panel of 2500 inbred lines design to represent the global sorghum genetic pool. Imputation of missing genotypic data will be done using sequencing data from a genetic core representative of the full sorghum panel. Functional Single Nucleotide Polymorphisms (SNPs) will be selected using the results of a Differential Nuclease Sensitivity Chromatin Profiling analysis. After imputation and filtering, we expect to obtain ~300,000 functional SNPs. These steps will be done by our collaborator Dr. Michael Gore from Cornell University. We are going to develop a genomic selection two-stage approach for predicting sorghum biomass before crop maturity. Using Multivariate Linear Mixed models in the first-step, adjusted entry means free of experimental effects will be obtained by modeling field variability within and between plots modeling spatial variables, and correcting against whether variability. We expect to compact the phenotypic Big Data set to 2500 adjusted entry means to each time point. The matrix of functional SNPs will be compacted using an artificial bin approach. The bin matrix will be coded using the additive Cockerham's model. The DBN will be designed by modeling the bin marker effects with a conditional relationship to its previous effect in time. We are going to use as benchmark the results from the Bayesian Linear Regression model (BLR) for comparison. Using public data to test our method we already observed in some cases ~4X improvements in predictive accuracy. We expect that our methodology will become a standard technique for phenotypic predictions before crop maturity using phenotypic and genotypic Big Data.
News published in Agência FAPESP Newsletter about the scholarship: