CGFlow: Compositional Flows for 3D Molecule and Synthesis Pathway Co-design
This is the official repository of our ICML 2025 paper: "Compositional Flows for 3D Molecule and Synthesis Pathway Co-design".
Overview: CGFlow introduces Compositional Generative Flows, a framework extending flow matching to generate compositional objects with continuous states. We apply CGFlow to synthesizable drug design by jointly designing a molecule's synthetic pathway and its 3D binding pose. For reproducing results reported in the paper, please refer to the submission version.
Demo: We have a web app demo available: 3DSynthFlow Demo. This demo illustrates the types of molecules and synthesis trajectories generated by 3DSynthFlow. The underlying model is trained in a pocket-conditional setting and is intended for demo and research purposes only.
⚠️ For practical drug discovery applications, we strongly recommend finetuning the model on your specific protein target.
Table of Contents
Acknowledgements
This project builds upon prior work including:
- GFlowNet repository by Recursion
- RxnFlow for synthesis-based generation
- TacoGFN for target-conditioned reinforcement learning
- SemlaFlow for flow matching-based molecular conformation generation
Installation
# Create and activate conda environment
# 1. Create and activate environment using mamba
mamba create -n cgflow python=3.11
mamba activate cgflow
# 2. Install PyTorch + PyG via pip
pip install torch==2.6.0 \
torch-geometric>=2.4.0 \
torch-scatter>=2.1.2 \
torch-sparse>=0.6.18 \
torch-cluster>=1.6.3 \
-f https://data.pyg.org/whl/torch-2.6.0+cu124.html
# 3. Install your package (-e for editable)
pip install -e .
# 4. Install extra dependencies (optional)
# - AutoDock Vina
pip install -e '.[vina]'
# - Unidock as GPU-accelerated docking
mamba install unidock
pip install -e '.[unidock]'
# - Extras (e.g., jupyter notebook)
mamba install notebook
pip install -e '.[extra]'
Data Preparation
Download Pretrained Model
You can download the pretrained model weights from here
gdown --id 1xGC193o4DtSPzWFjmRIlPjmn7bLfMaCd -O ./weights/cgflow_crossdock.ckpt
Construct Generative environment
See Data Preparation for detailed instructions on preparing datasets and environments.
Generation
1. Pocket-specific Optimization
A. GPU-accelerated UniDock
You can modify the config file to use your own protein target.
python scripts/opt/opt_unidock.py --config ./configs/opt/aldh1_unidock.yaml
B. AutoDock Vina (local-opt)
python scripts/opt/opt_vina.py --config ./configs/opt/aldh1_vina.yaml
2. Zero-shot Pocket-conditional Generation
TBA
3. Fine-tuning the pocket-conditional model
TBA
Pretraining Pocket-conditional Generative Model
If you want to train the pocket-conditional generative model, you can use the following procedure.
- Download the CrossDock2020 pockets according to the instructions in the Data Preparation section.
- You can use the following command to train the model:
python scripts/multi_pocket/tacogfn_proxy.py --name <PREFIX>
Pretraining Pose Prediction Model
If you want to train the pose prediction model, you can use the following procedure.
- Download the preprocessed data according to the instructions in the Data Preparation section.
- You can use the following command to train the model:
python scripts/pretrain/train.py --name <PREFIX>
License
This project is licensed under the MIT License.
Citation
If you use this work, please cite:
CGFlow (ICML '25)
@inproceedings{shen2025compositional,
title = {Compositional Flows for 3D Molecule and Synthesis Pathway Co-design},
author = {Tony Shen and Seonghwan Seo and Ross Irwin and Kieran Didi and Simon Olsson and Woo Youn Kim and Martin Ester},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning (ICML)},
year = {2025},
url = {https://openreview.net/forum?id=4aXfSLfM0Z}
}
RxnFlow (ICLR '25)
@inproceedings{seo2025generative,
title={Generative Flows on Synthetic Pathway for Drug Design},
author={Seonghwan Seo and Minsu Kim and Tony Shen and Martin Ester and Jinkyoo Park and Sungsoo Ahn and Woo Youn Kim},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=pB1XSj2y4X}
}
TacoGFN (TMLR '24)
@article{shen2024tacogfn,
title={Taco{GFN}: Target-conditioned {GF}lowNet for Structure-based Drug Design},
author={Tony Shen and Seonghwan Seo and Grayson Lee and Mohit Pandey and Jason R Smith and Artem Cherkasov and Woo Youn Kim and Martin Ester},
journal={Transactions on Machine Learning Research},
year={2024},
url={https://openreview.net/forum?id=N8cPv95zOU}
}
