Low-depth genotyping-by-sequencing (GBS) in a bovine population: Strategies to maximize the selection of high quality genotypes and the accuracy of imputation

Citation

Brouard, J.S., Boyle, B., Ibeagha-Awemu, E.M., Bissonnette, N. (2017). Low-depth genotyping-by-sequencing (GBS) in a bovine population: Strategies to maximize the selection of high quality genotypes and the accuracy of imputation. BMC Genetics, [online] 18(1), http://dx.doi.org/10.1186/s12863-017-0501-y

Plain language summary

Technical advances in sequencing methods and advances in genomics have had a major impact in the genetic evaluation and selection of livestock such as dairy cattle. In the years that followed the sequencing of the bovine genome in 2004, electronic chips of genotyping like the chip Illumina SNP50 were marketed. This chip makes it possible to test the presence of 50 000 potential genetic variations in the tested animals. More recently, a new technique called Genotyping-by-Sequencing (GBS) has been developed. This technique involves sequencing the genome and enzymes that reproducibly fragment it. Using this approach, one can verify the presence of genetic variations on tens of thousands of fragments of the genome. In this study, we compared the performance of two versions of this technique: the conventional method and a more selective method. The latter aims at reducing the number of fragments to increase the level of sequencing. Thus, better sequencing levels could be expected due to the reduced number of fragments generated by this technique. Using bioinformatics tools, we also describe different selection procedures aimed at obtaining genetic information that is of the best quality possible.
The results showed that the conventional method of the GBS technique allowed the identification of more than 272,000 variations compared to about 123,000 for the selective method. The accuracy of the information produced by these two methods was evaluated by comparing the results obtained with those produced by the SNP50 chip. Surprisingly, the results indicate that the genetic information produced by the conventional method is more accurate than that from the selective method. The conventional method remains more efficient even if it produces on average lower sequencing levels than the selective method. Our results also show that a GBS data analysis with judicious quality control criteria provides high quality genetic information with relatively low levels of sequencing. We also identify factors that contribute to improving the accuracy of the genetic information obtained by extrapolating the missing data associated with the GBS method.
Overall, the results revealed that the conventional version of the GBS method had the potential to test a considerable number of variations on the bovine genome and that the genetic information obtained by this method could be of high quality provided Selection criteria are applied. The results presented in this article provide a practical framework for the analysis of GBS data and provide strategies for maximizing the yield of this technique in animal populations, particularly in cattle. We demonstrate that the GBS technique is a credible alternative to the SNP50 genotyping chip to identify genetic variations associated with our populations and the problem studied. Indeed, the GBS will allow to test at low cost genetic variations in dairy cattle with paratuberculosis.

Abstract

Background: Genotyping-by-sequencing (GBS) has emerged as a powerful and cost-effective approach for discovering and genotyping single-nucleotide polymorphisms. The GBS technique was largely used in crop species where its low sequence coverage is not a drawback for calling genotypes because inbred lines are almost homozygous. In contrast, only a few studies used the GBS technique in animal populations (with sizeable heterozygosity rates) and many of those that have been published did not consider the quality of the genotypes produced by the bioinformatic pipelines. To improve the sequence coverage of the fragments, an alternative GBS preparation protocol that includes selective primers during the PCR amplification step has been recently proposed. In this study, we compared this modified protocol with the conventional two-enzyme GBS protocol. We also described various procedures to maximize the selection of high quality genotypes and to increase the accuracy of imputation. Results: The in silico digestions of the bovine genome showed that the combination of PstI and MspI is more suitable for sequencing bovine GBS libraries than the use of single digestions with PstI or ApeKI. The sequencing output of the GBS libraries generated a total of 123,666 variants with the selective-primer approach and 272,103 variants with the conventional approach. Validating our data with genotypes obtained from mass spectrometry and Illumina's bovine SNP50 array, we found that the genotypes produced by the conventional GBS method were concordant with those produced by these alternative genotyping methods, whereas the selective-primer method failed to call heterozygotes with confidence. Our results indicate that high accuracy in genotype calling (>97%) can be obtained using low read-depth thresholds (3 to 5 reads) provided that markers are simultaneously filtered for genotype quality scores. We also show that factors such as the minimum call rate and the minor allele frequency positively influence the accuracy of imputation of missing GBS data. The highest accuracies (around 85%) of imputed GBS markers were obtained with the FIMPUTE program when GBS and SNP50 array genotypes were combined (80,190 to 100,297 markers) before imputation. Conclusions: We discovered that the conventional two-enzyme GBS protocol could produce a large number of high-quality genotypes provided that appropriate filtration criteria were used. In contrast, the selective-primer approach resulted in a substantial proportion of miscalled genotypes and should be avoided for livestock genotyping studies. Overall, our study demonstrates that carefully adjusting the different filtering parameters applied to the GBS data is critical to maximize the selection of high quality genotypes and to increase the accuracy of imputation of missing data. The strategies and results presented here provide a framework to maximize the output of the GBS technique in animal populations and qualified the PstI/MspI GBS assay as a low-cost high-density genotyping platform. The conclusions reported here regarding read-depth and genotype quality filtering could benefit many GBS applications, notably genome-wide association studies, where there is a need to increase the density of markers genotyped across the target population while preserving the quality of genotypes.