mirror of
https://github.com/Saoge123/PocketFlow.git
synced 2026-06-04 12:44:22 +08:00
d6aec982b62263cc309ea6641517b7e13aef4ccd
PocketFlow: an autoregressive flow model incorporated with chemical acknowledge for generating drug-like molecules inside protein pockets
Requirements:
- Python 3.8
- pytorch 1.12
- Pytorch_Geometric 2.1.0
- RDKit
- Openbabel
- PyMol
Molecular generation
The molecule can be generated by running the following command, where the pocket pdb file and the model parameter file are required, and the rest of the parameters are optional
python main_generate.py -pkt test_samples/test_pocket10/1bvr_C_rec_pocket10-surf.pdb --ckpt ckpt/ZINC-pretrained-255000.pt -n 100 -d cuda:0 --root_path gen_results --name 1bvr -at 1.0 -bt 1.0 --max_atom_num 35 -ft 0.5 -cm True --with_print True
All parameters of generation:
usage: main_generate.py [-h] [-pkt POCKET] [--ckpt CKPT] [-n NUM_GEN] [--name NAME] [-d DEVICE] [-at ATOM_TEMPERATURE] [-bt BOND_TEMPERATURE] [--max_atom_num MAX_ATOM_NUM] [-ft FOCUS_THRESHOLD] [-cm CHOOSE_MAX]
[--min_dist_inter_mol MIN_DIST_INTER_MOL] [--bond_length_range BOND_LENGTH_RANGE] [-mdb MAX_DOUBLE_IN_6RING] [--with_print WITH_PRINT] [--root_path ROOT_PATH] [--readme README]
optional arguments:
-h, --help show this help message and exit
-pkt POCKET, --pocket POCKET
the pdb file of pocket in receptor
--ckpt CKPT the path of saved model
-n NUM_GEN, --num_gen NUM_GEN
the number of generateive molecule
--name NAME receptor name
-d DEVICE, --device DEVICE
cuda:x or cpu
-at ATOM_TEMPERATURE, --atom_temperature ATOM_TEMPERATURE
temperature for atom sampling
-bt BOND_TEMPERATURE, --bond_temperature BOND_TEMPERATURE
temperature for bond sampling
--max_atom_num MAX_ATOM_NUM
the max atom number for generation
-ft FOCUS_THRESHOLD, --focus_threshold FOCUS_THRESHOLD
the threshold of probility for focus atom
-cm CHOOSE_MAX, --choose_max CHOOSE_MAX
whether choose the atom that has the highest prob as focus atom
--min_dist_inter_mol MIN_DIST_INTER_MOL
inter-molecular dist cutoff between protein and ligand.
--bond_length_range BOND_LENGTH_RANGE
the range of bond length for mol generation.
-mdb MAX_DOUBLE_IN_6RING, --max_double_in_6ring MAX_DOUBLE_IN_6RING
--with_print WITH_PRINT
whether print SMILES in generative process
--root_path ROOT_PATH
the root path for saving results
--readme README, -rm README
description of this genrative task
Spliting Pocket
Based on the pose of the ligand, the pocket structure can be splited from the protein structure
from pocket_flow import SplitPocket, Protein, Ligand
pro = Protein('/path/to/protein.pdb')
lig = Ligand('/path/to/ligand.sdf')
dist_cutoff = 10
pocket_block, _ = SplitPocket._split_pocket_with_surface_atoms(pro, lig, dist_cutoff)
open('/path/to/pocket.pdb','w').write(pocket_block)
Dataset
The raw CrossDocked2020 dataset is large, which need about 50G disk space. You can donwload the processed data from Pocket2Mol
from pocket_flow import CrossDocked2020
unexpected_sample = [
line.split()[-1] for line in open('data/unexcept_element_sample_new.csv').read().split('\n')
]
cs2020 = CrossDocked2020(
'./data/crossdocked_pocket10/',
'./data/crossdocked_pocket10/index.pkl',
unexpected_sample=unexpected_sample
)
cs2020.run(
dataset_name='crossdocked_pocket10_processed_35Atoms.lmdb',
max_ligand_atom=35,
only_backbone=False,
lmdb_path='./data/'
)
The pretraining datase of PocketFlow was choosed from ZINC 3D. You can download ZINC 3D, and then use make_pretrain_data.py to produce the pretraining dataset.
Description
Languages
Python
100%