Software for Predicting Protein-Protein Interaction Sites using Phylogenetic Substitution Models


BindML (Binding site prediction by Maximum Likelihood) is a method for predicting protein-protein interface residues of a given protein structure using information from its protein family multiple sequence alignment (MSA) [1]. BindML tries to score and identify surface residues of a PDB structure which exhibits mutation patterns that are generally observed protein interfaces found in known protein complexes. Protein residue positions along the MSA with the strongest scoring mutation pattern are predicted as protein interface residues.





PDB Structure and MSA submission:

From the main BindML submission page (, four different input fields are requested:

1.Email (A link to the results on your server will be sent to this address when your submission is processed and completed)
2.PDB File (Standard PDB format)
3.PDB Chain ID: (For example, chain 'A' or chain 'B'. If there are no chains in your PDB, you can use the underscore "_" instead).
4.Multiple Sequence Alignment File (This must be in the FASTA format. You have to make sure that your PDB sequence is included in submitted MSA). If the MSA field is left empty, the server will try to automatically go to the PFAM-A, PFAM-B and HMMER database [3] in order and retrieve family sequences and automatically generate the MSA (with the sequence of your input PDB file) with MUSCLE [4,5].

A typical submission will be queued on our server until it is processed. The amount of time your job will run depends on the number of users that are queued before your submission. Also, how long things will run once it is processed on our server will depend on the size of your protein input structure (in terms of number of residues) and the number of sequences used in your multiple sequence alignment (MSA). For example, with the supplied shown in the main BindML submission page, once in the queue and your submission is processing, the PDB 7TIM with chain A is 248 residues long, and with a MSA of 57 sequences will take approximately three to five minutes to run until you get an email with a link to the interactive results on our server.



The interactive BindML output consists of an integrative structural-level view and a residue-level view with associated prediction scores. On the left panel, the structural view allows you to visualize the PDB structure previously submitted with BindML predictions. BindML scores range from negative to positive values using the color spectrum from red to blue respectively. Stronger (high confidence) predictions are more negative in value (red in color), while more positive scores represent weaker predictions (blue in color).

The right panel shows residue scores for all surface residue predictions. The scores with high level of confidence are colored in red. Clicking on the one letter residue name will highlight the location (residue prediction shown as a red sphere) of that residue prediction on the left structural panel. The residue dL-scores and Z-scores are listed in a “wrapped” manner, where displayed in sequence order from left to right and continues to be listed from top to the bottom of the screen.

Further more, the entire output of the PDB file and Z-scores mapped to the protein structure's B-factor can be downloaded from the link above the left and right panels.


1.La, D., & Kihara, D. (2011). A novel method for protein-protein interaction site prediction using phylogenetic substitution models. Proteins, 80(1), 126-141. doi:10.1002/prot.23169
2.Guindon, S., & Gascuel, O. (2003). A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic Biology, 52(5), 696-704.
3.Finn, R. D., Mistry, J., Tate, J., Coggill, P., Heger, A., Pollington, J. E., et al. (2009). The Pfam protein families database. Nucleic Acids Research, 38(Database), D211-D222. doi:10.1093/nar/gkp985
4.Edgar, R. C. (2004a). MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics, 5, 113. doi:10.1186/1471-2105-5-113
5.Edgar, R. C. (2004b). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32(5), 1792-1797. doi:10.1093/nar/gkh340

Lab Home | Supplementary Materials | Download | Contact

© Purdue University, 2010.  Website created by David La.  All rights reserved.