Comprehensive Description of Genome-Wide Nucleotide and Structural Variation in Short-Season Soybean
Davoud Torkamaneh, Jérôme Laroche, Aurélie Tardivel, Louise O’Donoughue, Elroy Cober, Istvan Rajcan and François Belzile. Comprehensive Description of Genome-Wide Nucleotide and Structural Variation in Short-Season Soybean. International Plant and Animal Genome XXV. January 14-18, 2017. San Diego, CA, USA.
Next-generation sequencing (NGS) and bioinformatics tools have greatly facilitated the characterization of nucleotide variation; nonetheless, an exhaustive description of both SNP haplotype diversity and of structural variation remain elusive. In this study, we sequenced a representative set of 102 short-season soybeans and achieved a highly extensive coverage of both nucleotide diversity and structural variation (SV). We called close to 5M sequence variants (SNPs, MNPs, and Indels) and noticed that the number of unique haplotypes had plateaued within this set of germplasm (1.7M tag SNPs). This dataset proved highly accurate (98.6%) based on a comparison of called genotypes at loci shared with a SNP array. We used this catalogue of SNPs as a reference panel to impute missing genotypes at untyped loci in datasets derived from lower density genotyping tools (150K GBS-derived SNPs). After imputation, 96.4% of the missing genotypes imputed in this fashion proved to be accurate. Using a combination of three bioinformatics pipeline, we uncovered ~92K SVs (deletions, insertions, inversions, duplications, CNVs, and translocations), and estimate that over 90% of these are accurate. Finally, we noticed duplication of certain genomic regions are the main source of residual heterozygosity in highly inbred soybean accessions. This is the first time that a comprehensive description of both SNP haplotype diversity and SV has been achieved within a regionally relevant subset of a major crop.