Isoform diversity in Eucalyptus grandis.

Downing, A.*, Lötter, A., Christie, N., Myburg, A. A.

Department of Biochemistry, Genetics and Microbiology & Department of Computer Science, Forest Molecular Genetics (FMG) Programme, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria, South Africa

Eucalyptus is the world's most widely grown hardwood tree species cultivated for biomaterials and bioenergy products. The advancement of third-generation single-molecule sequencing technologies like PacBio and Oxford Nanopore has enabled us to capture full-length RNA transcripts, opening the door to isoform and haplotype studies. Isoforms are alternative mature mRNA molecules produced from the same gene through alternative splicing of the pre-mRNA. Single nucleotide variants (SNVs) occurring in the gene can create different alleles, which could affect these isoforms and the proteins they produce. RNA sequencing allows us to quantify these isoforms, enabling us to test for differential expression of alleles and the isoforms produced in different tissue types and developmental stages. This project aims to understand the isoform repertoire of the TAG0014 genotype which will serve as the new reference genome for E. grandis. We collected 15 samples from different developmental stages of green, woody, and floral tissues. We then generated three pools of PacBio Iso-Seq data (green, woody, and floral tissues), and RNA-Seq data from all 15 tissue samples. The Iso-Seq data was mapped to two available haplotype-phased assemblies of TAG0014, and categorized to determine whether the isoforms are known or novel. We identified all isoforms generated and the haplotype-specific isoforms using this data. We assessed isoform diversity per gene, examined the types of alternative splicing involved, and identified tissue-specific isoform production. The RNA-seq data will be used to quantify the isoform abundance and determine if there is preferential allele-specific expression. This project will greatly improve our understanding of isoform diversity in E. grandis and lay the foundation for studying the effect of biotic and abiotic stress, tissue type, and stage of development on isoform production. Linking isoforms to different haplotypes will enhance our understanding of how sequence variations affect isoform and protein production, thereby facilitating genome-based breeding in the future.

Keywords: Eucalyptus, isoforms, IsoSeq, RNA-seq, allele-specific expression