SnpRecode: A Versatile and Fast Genotype Recoding and Correlation Function

Citation

2021 Marete, A. and N. Bissonnette. SnpRecode: A Versatile and Fast Genotype Recoding and Correlation Function. Research Square.doi.org/10.21203/rs.3.rs-95704/v3.

Plain language summary

Genotype imputation is an essential tool used in genomic selection in plants and animals. A popular imputation tool used in animal genomics is FImpute. FImpute, however, accepts a genotype in a specific format and produces assays that convert to a format that is not obvious when processing information with other software. This requires other action which causes significant processing time. We have developed SnpRecode as a helper tool that bridges the gap between regular genotype files and FImpute imputation software. SnpRecode performance is modest at 10 seconds / 1000 samples. This software is written in the Python programming language. SnpRecode software provides users with great flexibility in implementing with other software packages in a pipeline.

Abstract

Genotype imputation is an essential tool used in genomic selection in plants and animals. A popular imputation tool used in animal genomics is FImpute. FImpute, however, accepts a specifc genotype
format and produces dosages whose conversion to VCF or Plink format requires multiple software packages in a pipeline with a large amount of processing time. We have developed SnpRecode as a
helper tool that bridges the gap between regular genotype files and the FImpute imputation software by allowing for fast and seamless conversion of genotypes to-and-from FImpute format. SnpRecode also implements a fast genotype correlation function to estimate and plot the imputation accuracy. We run tests on 6,000 samples with a step of 1,000 to determine the performance of SnpRecode on various
sample sizes and runtime and memory usage used as performance measures. The performance of SnpRecode was modest at 10sec/1,000 samples. Written in Python programming language, SnpRecode provides users with great fexibility in implementation with other software packages in a pipeline.

Publication date

2021-07-08

Author profiles