The molecule can be generated by running the following command, where the pocket pdb file and the model parameter file are required, and the rest of the parameters are optional

python main_generate.py -pkt test_samples/test_pocket10/1bvr_C_rec_pocket10-surf.pdb --ckpt ckpt/ZINC-pretrained-255000.pt -n 100 -d cuda:0 --root_path gen_results --name 1bvr -at 1.0 -bt 1.0 --max_atom_num 35 -ft 0.5 -cm True --with_print True

All parameters of generation:

usage: main_generate.py [-h] [-pkt POCKET] [--ckpt CKPT] [-n NUM_GEN] [--name NAME] [-d DEVICE] [-at ATOM_TEMPERATURE] [-bt BOND_TEMPERATURE] [--max_atom_num MAX_ATOM_NUM] [-ft FOCUS_THRESHOLD] [-cm CHOOSE_MAX]
                        [--min_dist_inter_mol MIN_DIST_INTER_MOL] [--bond_length_range BOND_LENGTH_RANGE] [-mdb MAX_DOUBLE_IN_6RING] [--with_print WITH_PRINT] [--root_path ROOT_PATH] [--readme README]

optional arguments:
  -h, --help            show this help message and exit
  -pkt POCKET, --pocket POCKET
                        the pdb file of pocket in receptor
  --ckpt CKPT           the path of saved model
  -n NUM_GEN, --num_gen NUM_GEN
                        the number of generateive molecule
  --name NAME           receptor name
  -d DEVICE, --device DEVICE
                        cuda:x or cpu
  -at ATOM_TEMPERATURE, --atom_temperature ATOM_TEMPERATURE
                        temperature for atom sampling
  -bt BOND_TEMPERATURE, --bond_temperature BOND_TEMPERATURE
                        temperature for bond sampling
  --max_atom_num MAX_ATOM_NUM
                        the max atom number for generation
  -ft FOCUS_THRESHOLD, --focus_threshold FOCUS_THRESHOLD
                        the threshold of probility for focus atom
  -cm CHOOSE_MAX, --choose_max CHOOSE_MAX
                        whether choose the atom that has the highest prob as focus atom
  --min_dist_inter_mol MIN_DIST_INTER_MOL
                        inter-molecular dist cutoff between protein and ligand.
  --bond_length_range BOND_LENGTH_RANGE
                        the range of bond length for mol generation.
  -mdb MAX_DOUBLE_IN_6RING, --max_double_in_6ring MAX_DOUBLE_IN_6RING
  --with_print WITH_PRINT
                        whether print SMILES in generative process
  --root_path ROOT_PATH
                        the root path for saving results
  --readme README, -rm README
                        description of this genrative task

Spliting Pocket

Based on the pose of the ligand, the pocket structure can be splited from the protein structure

from pocket_flow import SplitPocket, Protein, Ligand

pro = Protein('/path/to/protein.pdb')
lig = Ligand('/path/to/ligand.sdf')
dist_cutoff = 10
pocket_block, _ = SplitPocket._split_pocket_with_surface_atoms(pro, lig, dist_cutoff)
open('/path/to/pocket.pdb','w').write(pocket_block)

Dataset

The raw CrossDocked2020 dataset is large, which need about 50G disk space. You can donwload the processed data from Pocket2Mol

from pocket_flow import CrossDocked2020

unexpected_sample = [
    line.split()[-1] for line in open('data/unexcept_element_sample_new.csv').read().split('\n')
    ]
cs2020 = CrossDocked2020(
    './data/crossdocked_pocket10/',
    './data/crossdocked_pocket10/index.pkl',
    unexpected_sample=unexpected_sample
    )
cs2020.run(
    dataset_name='crossdocked_pocket10_processed_35Atoms.lmdb',
    max_ligand_atom=35,
    only_backbone=False,
    lmdb_path='./data/'
    )

The pretraining datase of PocketFlow was choosed from ZINC 3D. You can download ZINC 3D, and then use make_pretrain_data.py to produce the pretraining dataset.