PIPE (Protein-protein Interaction Prediction Engine): A computational approach for comprehensive soybean functional genomics.

Citation

Samanfar B, Schoenrock A, Dehne F, Golshani A, Cober E, Charette M, Molnar S: PIPE (Protein-protein Interaction Prediction Engine): A computational approach for comprehensive soybean functional genomics. Great Lakes Bioinformatics and the Canadian Computational Biology Conference (GLBIO/CCBC) 2016 (GLBIO/CCBC 2016 Toronto 2016/05/16 - 2016/05/19), Toronto, Canada.

Plain language summary

Soybean is one of the major Canadian grain crops and its production is expanding in Canada with the majority of the increase in short season areas (Western Canada and northern regions). So far, eleven maturity loci have been reported in soybean, however the molecular basis of almost half of them is not yet clear. The list of novel factors affecting these pathways in soybean, and in model plants like Arabidopsis, continues to grow suggesting the presence of other novel players which are yet to be discovered. To this end, we have used three different approaches; bioinformatics (functional genomics), classical plant breeding and molecular biology (analysis of SSR and SNP haplotypes) to identify novel genes involved in flowering and maturity pathways in soybean.

Abstract

Protein-Protein Interactions (PPIs) are essential molecular interactions that define the biology of a cell, its development and responses to various stimuli. Theoretically (“guilt by association”), if a gene interacts with groups of genes involved in one specific pathway, that gene might also be involved in that specific pathway. Our knowledge of global PPI networks in complex organisms such as human and plants is restricted by technical limitations of current methods. The Protein-protein Interaction Prediction Engine (PIPE) is a computational tool used to predict protein-protein interactions (PPI). PIPE has been used to produce proteome-wide, all-to-all predicted interactomes in a variety of organisms including yeast (Saccharomyces cerevisiae), human (Homo sapiens), Arabidopsis and others. PIPE can produce individual PPI predictions in a fraction of a second and is typically tuned, for a given organism, to achieve a specificity of 99.95%. PIPE has been independently evaluated and compared to other PPI prediction methods and has been shown to significantly outperform the others in terms of recall-precision across all of the datasets tested. It has also been shown that PIPE has the ability to produce cross-species predictions (ie. use interaction data from one organism to make high quality PPI predictions in another). Briefly, PIPE works based on searching for re-occurring short polypeptide sequences between known interacting protein pairs; simply, it predicts interactions based on protein sequence information and a database of known interacting pairs. PIPE requires a set of known interacting protein pairs as well as their primary (amino acid) sequences to be able to make its predictions. Recently, PIPE is being redesigned to be able to computationally handle the large proteome of soybean (75,778 confirmed soybean protein sequences). Currently we are using PIPE towards predicting the first comprehensive protein-protein interaction network for soybean.
Soybean is one of the major Canadian grain crops and its production is expanding in Canada with the majority of the increase in short season areas (Western Canada and northern regions). So far, eleven maturity loci have been reported in soybean, however the molecular basis of almost half of them is not yet clear. The list of novel factors affecting these pathways in soybean, and in model plants like Arabidopsis, continues to grow suggesting the presence of other novel players which are yet to be discovered. To this end, we have used three different approaches; bioinformatics (functional genomics), classical plant breeding and molecular biology (analysis of SSR and SNP haplotypes) to identify novel genes involved in flowering and maturity pathways in soybean. Identification of molecular markers tagging the PIPE-identified genes controlling flowering and maturity in soybean will allow soybean breeders to efficiently develop varieties using molecular marker assisted breeding. Allele specific markers will allow stacking of early maturity alleles to develop even earlier maturing cultivars. This bioinformatics approach will also help to bridge the gap in knowledge of the flowering and maturity pathway in soybean and can be applied to other important traits such as seed protein content, oil quality and host-pathogen interactions.