Sequencing the gene space of tropical pine trees.

Maartens, M. H.*1, Ranketse, M.1, Lötter, A.1, Kleinhans, N.1, Duong, T. A.1, Christie, N.1, Wegrzyn, J. L.2, Myburg, A. A.1

1 Department of Biochemistry, Genetics and Microbiology, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria, South Africa
2 Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA

Genome-assisted breeding in forest trees has become feasible due to rapid advancements in DNA sequencing and genome-wide genotyping technologies. Despite their importance in South African forestry, the genomes of tropical pines have not been sequenced yet. This is due to their extremely large genomes, which, at approximately 22 Gb in size, makes genome sequencing both expensive and computationally challenging. This project aims to target and sequence the "gene space" of tropical pines by employing targeted gene-centric resequencing combined with trio-binning with Pinus patula and Pinus tecunumanii as anchor species. Trio-binning separates sequenced long read data of an F1 hybrid into parental haplotypes without relying on a reference genome, facilitating the assembly of these haplotypes individually and reducing complexity. To further reduce complexity, RNA sequencing data can be used to selectively retain DNA reads containing gene coding regions before assembling the gene space. Towards this, we performed high molecular weight (HMW) DNA extraction using a PacBio Nanobind DNA Kit for pine variety TG190, an F1 hybrid of P. patula and P. tecunumanii, followed by long-read DNA sequencing using the Oxford Nanopore PromethION platform. We also sequenced parental RNA using Illumina short-read sequencing, which facilitates both a trio-binning approach to separate the DNA long-read data from the F1 hybrid into individual haplotypes of the parents and filtering the DNA long reads to selectively retain the gene space. Assembling the haplo-gene space will cover approximately 10% of the genome, and separating and assembling the individual haplotypes will improve our understanding of the genome diversity of each species. These assemblies hold promise for developing high-throughput, cost-effective molecular breeding tools that will allow tracking and selection of genetic variants at these gene loci.

Keywords: tropical pines, gene space, trio-binning, complexity reduction, molecular breeding, haplo-gene space