Transcriptome analysis and genome annotation of diploid Aegilops species

Citation

Khadka B, Chawla HS, Edwards T, Lévesque-Lemay M, Zheng C, You FM, Pozniak CJ, Cloutier S (2024) Transcriptome analysis and genome annotation of diploid Aegilops species. Proc 31th Plant and Animal Genome Conference, San Diego, CA, Jan 13-17, PE0419 (poster)

Abstract

Aegilops species are valuable genetic resources for wheat improvement, particularly for biotic and abiotic stresses. However, the limited availability of high-quality reference genome assemblies and annotation remains a formidable barrier to fully realizing the potential of wheat wild relatives in breeding programs. Of the ten diploid Aegilops, a reference genome of Ae. tauschii (D genome) and chromosome-level genome assemblies of five diploid Aegilops genomes from the Sitopsis section have recently been released. The Genome Canada ‘4DWheat Project’ is dedicated to developing high-quality reference genome assemblies and annotations for the remaining diploid Aegilops. Using PacBio CCS HiFi assemblies anchored to BioNano physical maps, we have generated high-quality genome assemblies of Ae. comosa (M), Ae. markgrafii (C), Ae. speltoides (S) and Ae. umbellulata (U), Ae. uniaristata (N) with Ae. mutica (T) in progress. Between 86-89% of each genome is composed of transposable elements. Class-I long terminal repeats (LTRs) retrotransposons, such as Ty3/Gypsy and Ty1/Copia, constituted 30-36% and 18-22.5% of the repeat sequences, respectively. For gene annotation, we extracted RNA from ten tissues of each species and performed PacBio long-read isoform sequencing (IsoSeq) and RNASeq. Full-length cDNA sequences generated by IsoSeq provided accurate and comprehensive information on putative gene functions, and on the type and abundance of alternative splice variants. Using the genome assembly of Ae. speltoides, we developed an annotation pipeline that integrates ab initio, transcriptomic and evidence-based approaches. RNASeq data provided tissue-specificity for precise evaluation of the Aegilops transcriptome, while IsoSeq data improved the accuracy of annotated gene models by providing full-length gene information. High-confidence (HC) protein-coding gene models were identified using BLASTP searches against a trusted set of reference proteins from the UniProt and PTREP databases. Overall, 59,172-72,520 HC genes were annotated, covering 93.7-94.5% of the complete BUSCOs. Structural and functional annotation, and repeat analysis of Ae. mutica assembly will be carried out using the same integrative approach. An improved understanding of genome composition and gene content is paramount to deciphering the origin, phylogeny and genome evolution of Triticum and Aegilops species and to broaden the genetic base of wheat. This research will complete the production of annotated reference genomes of all 11 diploid Aegilops species. These resources are expected to facilitate the characterization of the more complex tetraploid and hexaploid Aegilops species and extraction of the useful genetic diversity for wheat improvement.

Publication date

2024-01-13