This software release is associated to the work "Missing gene identification using functional coherence scores" by Meghana Chitale, Ishita K Khan & Daisuke Kihara, Scientific Reports (In Submission) -------------- GO-MEP: Gene Ontology-based Missing Enzyme Predictor ------------ -------------------------- Release Date : 7th July, 2016 ----------------------- ------ File created by : Ishita K Khan, Computer Science, Purdue University ----- This software release contains 3 files- 1. Logit_all_features.R: Source code written in R to compute Missing Enzyme rank and the prediction probabilities using feature combinations of Profile, CAS, PAS, funsim, EXPR and Phlogenetic as described in the paper. Input is hardcoded with the code. For 688 positions, at each position we train using data from rest of the 687 position's data (1000 negative candidates for each position and 1 correct enzyme at each position). Testing is on 1000 candidates and correct one. To run this code, please follow these steps- 1A. Install R package "LiblineaR" (https://cran.r-project.org/web/packages/LiblineaR/) by opening an R terminal and typing "install.packages("LiblineaR")" 1B. run R code by running the commnad: Rscript Logit_all_features.R Output of this code are two files- 1C. all_logit_profile_funsim_expr_phyl_fam_pam.rank : rank of correct enzyme at 688 positions 1D. all_logit_profile_funsim_expr_phyl_fam_pam.prob : predicted probabilities of correct enzyme at 688 positions 2. sce_750_all.txt: First input file required by the code released and described in #1 above. The file has feature scores we use to predict the missing enzymes. It has scores of each of 5199 candidates + 1 enzyme at each of 688 positions. Needs significant memory ~10G or so to load the feature data file from input. 3. testindex.txt: Second input file required by the code released and described in #1 above. This is an index file containing 1000 candidates used to generate data in paper. It is plugged in the R code mentioned in #1 so that we don't choose new set of random 1000 candidates each time.