2025-07-15 20:48:43 -04:00
2025-05-13 13:21:58 -07:00
2025-07-15 20:48:43 -04:00
2025-07-15 20:48:43 -04:00
2025-07-15 20:48:43 -04:00
2025-07-15 20:48:43 -04:00
2025-07-15 20:48:43 -04:00
2025-07-15 20:48:43 -04:00
2025-07-15 20:48:43 -04:00
2025-07-15 20:48:43 -04:00
2025-07-15 20:48:43 -04:00
2025-07-15 20:48:43 -04:00
2025-07-15 20:48:43 -04:00
2025-07-15 20:48:43 -04:00
2025-07-15 20:48:43 -04:00
2025-07-15 20:48:43 -04:00

arXiv license: MIT

CGFlow: Compositional Flows for 3D Molecule and Synthesis Pathway Co-design

This is the official repository of our ICML 2025 paper: "Compositional Flows for 3D Molecule and Synthesis Pathway Co-design".

Overview: CGFlow introduces Compositional Generative Flows, a framework extending flow matching to generate compositional objects with continuous states. We apply CGFlow to synthesizable drug design by jointly designing a molecule's synthetic pathway and its 3D binding pose. For reproducing results reported in the paper, please refer to the submission version.

Demo: We have a web app demo available: 3DSynthFlow Demo. This demo illustrates the types of molecules and synthesis trajectories generated by 3DSynthFlow. The underlying model is trained in a pocket-conditional setting and is intended for demo and research purposes only.

⚠️ For practical drug discovery applications, we strongly recommend finetuning the model on your specific protein target.

CGFlow Overview

Table of Contents

  1. Acknowledgements
  2. Installation
  3. Data Preparation
  4. Generation
  5. Pretraining
  6. License
  7. Citation

Acknowledgements

This project builds upon prior work including:

Installation

# Create and activate conda environment
# 1. Create and activate environment using mamba
mamba create -n cgflow python=3.11
mamba activate cgflow

# 2. Install PyTorch + PyG via pip
pip install torch==2.6.0 \
    torch-geometric>=2.4.0 \
    torch-scatter>=2.1.2 \
    torch-sparse>=0.6.18 \
    torch-cluster>=1.6.3 \
    -f https://data.pyg.org/whl/torch-2.6.0+cu124.html

# 3. Install your package (-e for editable)
pip install -e .

# 4. Install extra dependencies (optional)
# - AutoDock Vina
pip install -e '.[vina]'
# - Unidock as GPU-accelerated docking
mamba install unidock
pip install -e '.[unidock]'
# - Extras (e.g., jupyter notebook)
mamba install notebook
pip install -e '.[extra]'

Data Preparation

Download Pretrained Model

You can download the pretrained model weights from here

gdown --id 1xGC193o4DtSPzWFjmRIlPjmn7bLfMaCd -O ./weights/cgflow_crossdock.ckpt

Construct Generative environment

See Data Preparation for detailed instructions on preparing datasets and environments.

Generation

1. Pocket-specific Optimization

A. GPU-accelerated UniDock

You can modify the config file to use your own protein target.

python scripts/opt/opt_unidock.py --config ./configs/opt/aldh1_unidock.yaml

B. AutoDock Vina (local-opt)

python scripts/opt/opt_vina.py --config ./configs/opt/aldh1_vina.yaml

2. Zero-shot Pocket-conditional Generation

TBA

3. Fine-tuning the pocket-conditional model

TBA

Pretraining Pocket-conditional Generative Model

If you want to train the pocket-conditional generative model, you can use the following procedure.

  • Download the CrossDock2020 pockets according to the instructions in the Data Preparation section.
  • You can use the following command to train the model:
    python scripts/multi_pocket/tacogfn_proxy.py --name <PREFIX>
    

Pretraining Pose Prediction Model

If you want to train the pose prediction model, you can use the following procedure.

  • Download the preprocessed data according to the instructions in the Data Preparation section.
  • You can use the following command to train the model:
    python scripts/pretrain/train.py --name <PREFIX>
    

License

This project is licensed under the MIT License.

Citation

If you use this work, please cite:

CGFlow (ICML '25)

@inproceedings{shen2025compositional,
  title     = {Compositional Flows for 3D Molecule and Synthesis Pathway Co-design},
  author    = {Tony Shen and Seonghwan Seo and Ross Irwin and Kieran Didi and Simon Olsson and Woo Youn Kim and Martin Ester},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning (ICML)},
  year      = {2025},
  url       = {https://openreview.net/forum?id=4aXfSLfM0Z}
}

RxnFlow (ICLR '25)

@inproceedings{seo2025generative,
  title={Generative Flows on Synthetic Pathway for Drug Design},
  author={Seonghwan Seo and Minsu Kim and Tony Shen and Martin Ester and Jinkyoo Park and Sungsoo Ahn and Woo Youn Kim},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025},
  url={https://openreview.net/forum?id=pB1XSj2y4X}
}

TacoGFN (TMLR '24)

@article{shen2024tacogfn,
  title={Taco{GFN}: Target-conditioned {GF}lowNet for Structure-based Drug Design},
  author={Tony Shen and Seonghwan Seo and Grayson Lee and Mohit Pandey and Jason R Smith and Artem Cherkasov and Woo Youn Kim and Martin Ester},
  journal={Transactions on Machine Learning Research},
  year={2024},
  url={https://openreview.net/forum?id=N8cPv95zOU}
}
Description
No description provided
Readme MIT 6.5 MiB
Languages
Jupyter Notebook 67.5%
Python 30.8%
Shell 1.7%