Transcriptome analysis and genome annotation of diploid Aegilops species

Citation

Khadka B, Chawla HS, Edwards T, Lévesque-Lemay M, Zheng C, You FM, Pozniak CJ, Cloutier S (2023) Transcriptome analysis and genome annotation of diploid Aegilops species. Proc 5th Canadian Wheat Symposium, Vancouver, Nov 13-16, P17

Abstract

Aegilops species constitute a repertoire of genetic diversity harboring agronomic, end-use quality, abiotic
and biotic stress resistance traits for wheat improvement. However, one significant barrier to utilizing
useful genes from wild relatives in breeding is the dearth of high-quality genomic assemblies and
annotations for these species. A reference genome of Ae. tauschii (D) has been published, and the
chromosome-level genome sequences of five Aegilops genomes belonging to the Sitopsis section have
been recently released. In Canada, the “4DWheat” project is developing annotated reference genome
assemblies of the remaining diploid Aegilops species. Using PacBio CCS HiFi assemblies anchored to
BioNano physical maps, we have generated high-quality genome assemblies of Ae. comosa (M), Ae.
markgrafii (C), Ae. speltoides (S) and Ae. umbellulata (U), with Ae. uniaristata (N) and Ae. mutica (T) in
progress. The genomes repeats were annotated and transposable elements (TEs) accounted for 86.02-
88.18% of the genomes. Of these TEs, class-I long terminal repeats (LTRs) retrotransposons were
predominant, with Gypsy and Copia constituting 30.01-36.12% and 18.45-22.46%, respectively. To
annotate protein-coding regions, we extracted RNA from ten tissues of each Aegilops species from which
we performed PacBio long-read isoform sequencing (IsoSeq) and RNASeq. Full-length cDNA sequences
generated by IsoSeq provided accurate and comprehensive information on putative gene functions and
type and abundance of alternative splice variants. Using the genome assembly of Ae. speltoides, an
annotation pipeline combining ab initio, transcriptomic and evidence-based approaches, was developed.
High-confidence (HC) protein-coding gene models were identified by performing BLASTP searches
against a trusted set of reference proteins from the UniProt and PTREP databases. RNASeq data
provided tissue-specificity for a precise evaluation of the Aegilops transcriptome, while IsoSeq improved
the accuracy of annotated gene models by providing full-length gene information. In Ae. speltoides
genome, we identified 66,512 HC protein-coding genes, with 94.3% complete BUSCOs in the
embryophyta_odb10 database. Structural and functional annotation, and repeat analysis of the remaining
Aegilops assemblies are being carried out using the same integrative approach. An improved
understanding of genome composition and gene content is paramount to deciphering the origin,
phylogeny and genome evolution of Triticum and Aegilops species and to broaden the genetic base of
wheat. As a result of this work, the production of all 11 diploid Aegilops annotated reference genomes will
be completed. These results, coupled with genomic resources developed here, will also facilitate the
characterization of the more complex tetraploid and hexaploid Aegilops species.

Publication date

2023-11-13