Reference-level genome assemblies and comparative analysis of five representative Canadian barley cultivars

Citation

You FM, Xu W, Tucker JR, Khanal R, Beattie A, Brar G, Fu Y-B, Zheng C, Malti S, Shah K, Liton U, Holden S, Yao Z, Singh J, Boyle B, Belzile F, Mascher M, Tinker N, Bekele W, and Badea A. Reference-level genome assemblies and comparative analysis of five representative Canadian barley cultivars. 2024 Barley Symposium – From the Ground Up. February 26-27, Saskatoon, Canada (Oral talk)

Résumé

To address the challenges in Canadian barley breeding in responses to climate changes and unravel the genetic foundations of its worldclass cultivars, we have crafted reference assemblies for five representative, two-row Canadian barley cultivars. These cultivars encompass two general-purpose varieties: CDC Austenson and Morrison, along with three malting varieties: AAC Synergy, AAC Connect, and CDC Fraser.
Our sequencing strategy integrated a multifaceted approach, combining Illumina paired-end reads, long mate-pair reads, PacBio reads, 10X Chromium linked read libraries, and chromosome conformation capture sequencing (Hi-C) to meticulously construct chromosome-scale pseudomolecules. The TRITEX pipeline was used to assemble short reads, while Minimap2 and Miniasm were employed for the backbone assembly of PacBio reads. The quickmerge program was used to merge contigs generated from both TRITEX and Miniasm. For CDC Fraser, the assembly was completed by NRGene Technologies. The resulting genome sequences for these five cultivars, showcased genome sizes varying from 3.5 to 4.1 Gb, including unmapped scaffolds ranging from 102.9 to 536.87 Mb. Employing the benchmarking universal single-copy orthologous genes (BUSCO) analysis, our genome assemblies exhibited 89.3% (Morrison) to 98.32% (CDC Fraser) complete and single copy genes from the plant database (embryophyta_odb10). Dotplot comparisons of these five assemblies revealed a high level of chromosomal collinearity with two published barley reference genomes, Morex V3 (six-row) and Golden Promise V1 (two-row). Repeat sequence analysis utilizing RepeatMasker and the TREP repeat database, unveiled that transposable elements (TEs) constituted a substantial portion of the entire genomes, ranging from 78.17% to 81.92%. These findings mirrored the TE contents in Golden Promise V1 (80.61%) and Morex V3 (81.72%). The majority of TEs were LTR retrotransposons, with an average of 21.38% of the entire genomes for the Ty1/Copia superfamily and 48.21% of the entire genomes for the Ty3/Gypsy superfamily. To ensure an accurate annotation of protein-coding genes, we isolated RNA from five tissues of each cultivar with 3 biological replicates, and generated a substantial dataset of 100-bp pair-end reads, ranging from 1087.23 to 1336.71 million reads per cultivar, totaling 107.49 to 132.49 Gb in size. These high-coverage mRNA sequences provided a comprehensive coverage of the entire genome’s gene content.
These meticulously crafted reference-level assemblies, alongside the wealth of identified genes, offer significant potential to the Canadian and international barley breeding programs supporting improvements in efficiency of the breeding process as well as broader barley research community, catalyzing advancements in barley cultivation and genetic research.