Reference, pan- and population genome sequencing of the eucalypts for woody biomass and adaptative traits.

Lötter, A.1, Simelane, N.2, Downing, A.1, Sathekge, K. H.1, Christie, N.1, Duong, T. A.1, Mostert-O’Neill, M.3, Gakenou, O.4, Drew, D.4, Grimwood, J.5, Bruna, T.6, Barry, K.6, Talag, J.7, Healey, A.5, Jenkins, J. W.5, Lovell, J. T.5,6, Schmutz, J.5,6, Borevitz, J.8, Wegrzyn, J. L.9, Myburg, A. A.*1

1 Department of Biochemistry, Genetics and Microbiology, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria, South Africa
2 Department of Computer Science, University of Pretoria, Pretoria, South Africa
3 Research & Development, Forestry Division, Mondi South Africa, Hilton, South Africa
4 Department of Forest and Wood Science, Stellenbosch University, Stellenbosch, South Africa
5 HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
6 Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
7Arizona Genomics Institute, University of Arizona, Tucson, AZ, USA
8 Research School of Biology and Centre for Biodiversity Analysis, ARC Centre of Excellence in Plant Energy Biology, Australian National University, Canberra, Australia
9 Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA

We are performing high-throughput sequencing of the genomes of the eucalypts comprising over 900 members of the sister genera Eucalyptus, Corymbia and Angophora and some of the most productive and widely adapted plantation forestry species on earth. This effort includes a new phased, telomere-to-telomere reference for a clonal genotype of Eucalyptus grandis (TAG0014), replacing the previous diploid reference assembly for the genus (BRASUZ1, Myburg et al. Nature 2014). Preliminary analysis of the two haplogenomes in TAG0014 revealed smaller individual genome sizes and annotated gene numbers of 570.8 Mbp/35,929 (Hap1) and 552.4 Mbp/35,583 (Hap2). Reference sequencing (HiFi/Omni-C/Iso-seq/RNA-seq) and assembly have been extended across 21 diverse E. grandis individuals representing the natural range in Australia and three breeding individuals in South Africa to generate a new pan-genome reference and to uncover the extent of haplotype and structural diversity and its functional consequences for gene expression and regulation in the species. Phased assemblies have been completed for 24 pangenomes yielding 50 haplogenomes together with those of the TAG0014 reference. Preliminary analysis of the haplogenomes revealed that 358.8 to 396.3 Mb (62.9 - 69.4%) of the genomes are syntenic, while 58.9 to 90.1 Mb (10.3 - 15.8%) of the 50 haplogenomes are not shared with the TAG0014 Hap1 reference. To understand genome diversity associated with woody biomass formation and adaptive traits in the natural range, we performed diversity sequencing (40-50X Illumina) of 190 family representatives and population resequencing (10-20X Illumina) for over 2000 individuals from a subset of over 100 families planted in replicated common garden trials in South Africa. These trials and more than 2200 genomes serve as resources for landscape genomics and genome-wide association studies aimed at dissecting the genetic basis of woody biomass formation and environmental adaptations in E. grandis. We are also using the common garden trials as open-air laboratories to test and implement remote phenotyping technologies such as drone and terrestrial LiDAR scanning to expand our understanding of phenotypes such as growth, canopy structure and biomass production.