Update dataset links and formatting in readme

Fixed formatting issues and improved clarity in dataset descriptions.
This commit is contained in:
Jun Liu
2025-12-05 17:17:39 +08:00
committed by GitHub
parent a1be0b9eff
commit c6ef49022b

View File

@@ -2,5 +2,5 @@
1. You can download the sequence pair dataset used to train **PPLM** through [pplm_dataset](https://drive.google.com/file/d/1dt9omH2A-1-Uucz93lflgNbonRpAe2Mo/view?usp=sharing)<br>
2. You can access the original **protein-protein interaction** dataset from [D-SCRIPT](https://github.com/samsledje/D-SCRIPT/tree/main/data). The corrected pair lists by remove duplicate, erroneous, and invalid negative samples are provided in the **ppi** folder.<br>
3. You can access the original **protein-protein binding affinity** dataset from [PPB-Affinity](https://github.com/ChenPy00/PPB-Affinity). To prevent potential data leakage, we resplited the five-fold cross-validation list by considering the structure similarity, and the list of PDB IDs for each fold is provided in the **affinity** folder<br>.
4. You can download the **inter-protein contact prediction** dataset through [contact_dataset](https://drive.google.com/file/d/1TtOmrqFKA_wVGk563QQTEoDVV46rE_-t/view?usp=sharing)<br>.
3. You can access the original **protein-protein binding affinity** dataset from [PPB-Affinity](https://github.com/ChenPy00/PPB-Affinity). To prevent potential data leakage, we resplited the five-fold cross-validation list by considering the structure similarity, and the list of PDB IDs for each fold is provided in the **affinity** folder.<br>
4. You can download the **inter-protein contact prediction** dataset through [contact_dataset](https://drive.google.com/file/d/1TtOmrqFKA_wVGk563QQTEoDVV46rE_-t/view?usp=sharing).<br>