PIPE4: Fast PPI Predictor for Comprehensive Inter- and Cross-Species Interactomes

Citation

Dick, K., Samanfar, B., Barnes, B., Cober, E.R., Mimee, B., Tan, L.H., Molnar, S.J., Biggar, K.K., Golshani, A., Dehne, F., Green, J.R. (2020). PIPE4: Fast PPI Predictor for Comprehensive Inter- and Cross-Species Interactomes. Scientific Reports, [online] 10(1), http://dx.doi.org/10.1038/s41598-019-56895-w

Plain language summary

Soybean is one of the largest sources of vegetable oil and protein in the world, and also an important legume crop to the Canadian economy. In order to expand soybean further north and west in Canada, the identification and characterization of genes conferring resistance to the pest Soybean Cyst Nematode are crucial. However, there is a lack of understanding on the comprehensive genetic mechanism of resistance to Soybean Cyst Nematode. There is a particular gap in knowledge on how proteins interact both within species, and between those produced by soybeans and those produced by its pests. Protein-Protein Interactions are essential molecular interactions that define the biology of a cell, its development and its responses to various stimuli. Here, the authors develop a computational tool capable of predicting genome-wide protein-protein interactions in soybean. This tool was used to predict protein-protein interactions relevant to the impacts that Soybean Cyst Nematode has on soybeans and identify candidate genes for Soybean Cyst Nematode resistance, which may have a far-reaching agricultural and economic impact for facilitating the expansion of Soybean crops in Canada. In addition, the tools developed to predict protein-protein interactions can be broadly applied to other species and interactions

Abstract

The need for larger-scale and increasingly complex protein-protein interaction (PPI) prediction tasks demands that state-of-the-art predictors be highly efficient and adapted to inter- and cross-species predictions. Furthermore, the ability to generate comprehensive interactomes has enabled the appraisal of each PPI in the context of all predictions leading to further improvements in classification performance in the face of extreme class imbalance using the Reciprocal Perspective (RP) framework. We here describe the PIPE4 algorithm. Adaptation of the PIPE3/MP-PIPE sequence preprocessing step led to upwards of 50x speedup and the new Similarity Weighted Score appropriately normalizes for window frequency when applied to any inter- and cross-species prediction schemas. Comprehensive interactomes for three prediction schemas are generated: (1) cross-species predictions, where Arabidopsis thaliana is used as a proxy to predict the comprehensive Glycine max interactome, (2) inter-species predictions between Homo sapiens-HIV1, and (3) a combined schema involving both cross- and inter-species predictions, where both Arabidopsis thaliana and Caenorhabditis elegans are used as proxy species to predict the interactome between Glycine max (the soybean legume) and Heterodera glycines (the soybean cyst nematode). Comparing PIPE4 with the state-of-the-art resulted in improved performance, indicative that it should be the method of choice for complex PPI prediction schemas.