The pan-genome of Eucalyptus grandis.

Lötter, A.*1, Duong, T. A.1, Grimwood, J.2, Bruna, T.3, Barry, K.3, Talag, J.4, Jenkins, J. W.2, Lovell, J. T.2,3, Schmutz, J.2,3, Borevitz, J.5, Wegrzyn, J. L.6, Myburg, A. A.1

1 Department of Biochemistry, Genetics and Microbiology, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria, South Africa
2 HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
3 Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
4 Arizona Genomics Institute, University of Arizona, Tucson, AZ, USA
5 Research School of Biology and Centre for Biodiversity Analysis, ARC Centre of Excellence in Plant Energy Biology, Australian National University, Canberra, ACT 0200, Australia
6 Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA

Pan-genomic variation, including structural variants (SVs), transposable elements (TEs) and gene presence/absence variants (PAVs) have been shown to contribute to environmental adaptation and production traits in several crop species. The contribution of pan-genomic variation to growth, development and resilience in long-lived forest tree species is not well understood due to high levels of diversity, heterozygosity and genome complexity. We aim to characterise the pan-genome diversity of Eucalyptus grandis that could contribute to its ability to inhabit a natural range spanning the entire east coast of Australia. Using a combination of PacBio HiFi, Omni-C, Illumina RNA-Seq, and PacBio IsoSeq sequencing data, we have assembled and annotated haplotype-phased genomes for one clonal genotype (TAG0014) that will serve as the new reference for the species and 24 diverse E. grandis genotypes. The new TAG0014 reference genome is 570.8/552.4 Mbp (HAP1/HAP2) in size and has 35,929/35,583 gene models. Haplotype-phased assemblies have been completed for the 24 individuals, yielding 50 haplotype-phased telomere-to-telomere genomes (including TAG0014) of 530.9 to 574.5 Mbp in size (average scaffold N50 54.2 Mbp, BUSCO completeness > 96.9%). Genome-wide synteny comparisons have revealed that 359.3 - 396.3 Mbp (62.9 - 69.4%) of the haplotype-phased genomes are syntenic to the TAG0014 HAP1 genome. This is the first step towards understanding the extent and contribution of haplotype and structural diversity (pan-genome variation) to phenotypic diversity and adaptation in E. grandis.

Keywords: Eucalyptus, pan-genome, Pacific Biosciences, phased assembly, Omni-C