PPLM: A Paired Sequence Language Model for Protein-Protein Interaction Modeling
Version 1.0, 03/25/2025
(Copyrighted by the Regents of the National University of Singapore, All rights reserved)
PPLM is a protein–pair language model that learns directly from paired sequences through a novel attention architecture, explicitly capturing inter-protein context. Building on PPLM, we developed PPLM-PPI, PPLM-Affinity, and PPLM-Contact for predicting protein–protein interactions, estimating binding affinity, and identifying interface residue contacts, respectively.
Authors: Jun Liu, Hungyu Chen, and Yang Zhang
Contact: junl_sg@nus.edu.sg
License: PolyForm Noncommercial License
Citation:
Jun Liu, Hungyu Chen, Yang Zhang. A Paired Sequence Language Model for Protein-Protein Interaction Modeling. Nature Communications (2026). https://doi.org/10.1038/s41467-026-70457-5
System Requirements
- x86_64 machine
- Linux Kernel OS
Software & Dataset Requirements (for PPLM-Contact)
- HH-suite3 for MSA Search: Install HH-suite3 and update the "hhsuite_dir" parameter in the "pplm_contact/config.py" file.
- Uniclust Database: Download the Uniclust30 database, unzip it on your machine, and update the "UniRef_database" parameter in the "pplm_contact/config.py" file.
- CCMpred for DCA: Install ccmpred, or use the pre-packaged version in the "pplm_contact/external_tools" directory. Set the "ccmpred" parameter in the "pplm_contact/config.py" file. You may need to grant permission by running 'chmod +x pplm_contact/external_tools/ccmpred'.
- LoadHHM for PSSM Calculation: Download LoadHHM.py and place the file in the "pplm_contact" directory of the PPLM package, or use the pre-packaged version within the "pplm_contact" directory.
- ESM-MSA for Feature Generation: Install the ESM package, or use the pre-packaged version within "pplm_contact/external_tools" directory. Download the pre-trained ESM-MSA model and set the "esm_msa_model" parameter in the "pplm_contact/config.py" file.
Download parameters
You can download the pre-trained weights for PPLM and its downstream models from the links below and place them in the weights/ directory:
- PPLM: https://zhanggroup.org/PPLM/bin/weights/pplm_t33_650M.pt or google_drive
- PPLM-PPI: https://zhanggroup.org/PPLM/bin/weights/ppi_models.pkl or pplm-ppi_weights
- PPLM-Affinity: https://zhanggroup.org/PPLM/bin/weights/affinity_models.pkl or pplm-affinity_weights
- PLM-Contact: https://zhanggroup.org/PPLM/bin/weights/pplm_contact_models.pkl or pplm-contact_weights
- PLM-Contact2: https://zhanggroup.org/PPLM/bin/weights/pplm_contact_models2.pkl or pplm-contact2_weights
Usage
1. Install environment
conda env create -f environment.yml
2. Activate environment
conda activate PPLM
3. Run PPLM-PPI
python run_pplm-ppi.py example/seq1.fasta example/seq2.fasta
You can also run PPLM-PPI for two individual sequences:
python pplm_ppi/predict.py example/seq1.fasta example/seq2.fasta
4. Run PPLM-Affinity
python run_pplm-affinity.py example/receptor.fasta example/ligand.fasta
5. Run PPLM-Contact
For homodimer
python run_pplm-contact.py example/protein.pdb example/protein.pdb example/homo_example
For heterodimer
python run_pplm-contact.py example/protein1.pdb example/protein2.pdb example/hetero_example
6. Run PPLM-Contact2
For homodimer
python run_pplm-contact2.py example/homodimer.afm.pdb example/homodimer.af3.pdb example/homodimer.dmf.pdb example/homo_example2
For heterodimer
python run_pplm-contact2.py example/heterodimer.afm.pdb example/heterodimer.af3.pdb example/heterodimer.dmf.pdb example/hetero_example2
7. Generate embeddings and attention matrices for other applications
python run_pplm.py example/seq1.fasta example/seq2.fasta example/seq1-seq2.pplm.pkl
Example Outputs
PPLM-PPI
- Command:
python python run_pplm-ppi.py example/seq1.fasta example/seq2.fasta
- Output: Predicted interaction probability printed to the command line:
Predicted interaction score: 0.9431089
PPLM-Affinity
- Command:
python run_pplm-affinity.py example/receptor.fasta example/ligand.fasta
- Output: Predicted binding affinity printed to the command line:
Predicted binding affinity: -7.6090136
PPLM-Contact
- Command:
python run_pplm-contact.py example/protein.pdb example/protein.pdb example/homo_example
- Output: The predicted contacts are saved in example/homo_example/homo_example.pred_contact.txt:
Format:
Rank ResIdx1 ResType1 ResIdx2 ResType2 Contact_Probability
1 23:A MET 26:B CYS 0.976151
2 26:A CYS 23:B MET 0.974481
3 22:A ILE 26:B CYS 0.971633
4 23:A MET 30:B GLN 0.971191
5 30:A GLN 22:B ILE 0.970514
6 27:A GLY 23:B MET 0.970334
7 22:A ILE 30:B GLN 0.970124
8 30:A GLN 23:B MET 0.96919
9 23:A MET 27:B GLY 0.966725
10 23:A MET 23:B MET 0.966512
...
Troubleshooting (MKL and libperl.so)
On some systems, users may encounter MKL- or libperl.so-related errors due to local library and environment differences. We recommend installing PPLM in a fresh conda environment using the provided environment.yml:
conda create -n pplm python=3.10
conda activate pplm
conda env update -n pplm -f environment.yml
If MKL errors persist (e.g. import errors for numpy/pytorch), please try reinstalling MKL or recreating the environment:
conda install mkl
If you see an error about libperl.so not found when running PPLM-Contact or PPLM-Contact2, first ensure perl is installed and that the library path is visible:
conda install -c conda-forge perl
ls $CONDA_PREFIX/lib | grep libperl
cd $CONDA_PREFIX/lib
ln -s libperl.so.5.xx libperl.so # replace with the actual version
export LD_LIBRARY_PATH="$CONDA_PREFIX/lib:$LD_LIBRARY_PATH"
