mirror of
https://github.com/RosettaCommons/foundry.git
synced 2026-06-04 13:24:22 +08:00
initial progress on documentation cleanup (#722)
* initial progress on documentation cleanup * formatting input.md * docs: more documentation for ppi and sm binder design * fix typo --------- Co-authored-by: Rohith Krishna <rohith@localhost>
This commit is contained in:
31
README.md
31
README.md
@@ -57,40 +57,13 @@ We include details DNA, Ligands, Protein-Protein Interaction, Symmetry-condition
|
||||
- **foundry**: Model architectures, training, inference endpoints
|
||||
- **models/\<model\>:** Released models.
|
||||
|
||||
### Debugging
|
||||
|
||||
VSCode-native debugging with Apptainers:
|
||||
|
||||
1. Add to `.vscode/launch.json`:
|
||||
```json
|
||||
{
|
||||
"name": "Python: Attach",
|
||||
"type": "debugpy",
|
||||
"request": "attach",
|
||||
"connect": {
|
||||
"host": "localhost",
|
||||
"port": 2345
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
2. Set breakpoints in VSCode
|
||||
|
||||
3. Launch with debug port:
|
||||
```bash
|
||||
export DEBUG_PORT=2345
|
||||
./train.py experiment=...
|
||||
```
|
||||
|
||||
4. Attach debugger when prompted (F5)
|
||||
|
||||
#### For Core Developers (Multiple Packages)
|
||||
|
||||
Install both `foundry` and models in editable mode for development:
|
||||
|
||||
```bash
|
||||
# Install foundry and RF3 in editable mode
|
||||
uv pip install -e . -e ./models/rf3
|
||||
uv pip install -e . -e ./models/rf3 -e ./models/rfd3 -e ./models/mpnn
|
||||
|
||||
# Or install only foundry (no models)
|
||||
uv pip install -e .
|
||||
@@ -165,3 +138,5 @@ If you use this repository code or data in your work, please cite the relavant w
|
||||
publisher={Nature Publishing Group US New York}
|
||||
}
|
||||
```
|
||||
## Acknowledgments
|
||||
We thank Rachel Clune and Hope Woods from the RosettaCommons for their collaboration on the codebase, documentation, tutorials and examples.
|
||||
|
||||
@@ -4,57 +4,24 @@
|
||||
<img src="docs/.assets/trajectory.png" alt="All-atom diffusion with RFD3">
|
||||
</p>
|
||||
|
||||
|
||||
## Installation, Setup, and a Basic Design
|
||||
### A. Installation using `uv`
|
||||
## Get Started
|
||||
1. Install RFdiffusion3. See [Main README](../../README.md) for instructions how to install all models to run full pipeline (recommended). If you have already installed all the models skip [here](#run-inference).
|
||||
```bash
|
||||
git clone https://github.com/RosettaCommons/foundry.git \
|
||||
&& cd foundry \
|
||||
&& uv python install 3.12 \
|
||||
&& uv venv --python 3.12 \
|
||||
&& source .venv/bin/activate \
|
||||
&& uv pip install -e ".[rfd3]"
|
||||
pip install rc-foundry[rfd3]
|
||||
```
|
||||
<!--
|
||||
> [!IMPORTANT]
|
||||
> You must install `foundry` (the root package) with `-e` first, then install `rfd3`. This ensures both packages are in editable mode for proper development workflow.
|
||||
-->
|
||||
> [!NOTE]
|
||||
> optionally make installed venv available as ipynb kernel (helpful for running examples in `examples/all.ipynb`)
|
||||
`python -m ipykernel install --user --name=foundry --display-name "foundry"`
|
||||
|
||||
### B. Download model weights for RFD3
|
||||
2. Download checkpoint to your desired checkpoint location.
|
||||
```bash
|
||||
wget http://files.ipd.uw.edu/pub/rfd3/rfd3_foundry_2025_12_01.ckpt
|
||||
```
|
||||
*You can store these weights anywhere you would like,
|
||||
but if you do not store them in the root directory
|
||||
you will need to change the `cur_ckpt` variable discussed
|
||||
later on.*
|
||||
|
||||
**Setup**
|
||||
```bash
|
||||
export PROJECT_PATH="$(pwd)/models/rfd3/src:$(pwd)/src:$(pwd)/lib/atomworks/src"
|
||||
```
|
||||
If your virtual environment is not already active you will
|
||||
also need to run:
|
||||
```
|
||||
source .venv/bin/activate
|
||||
foundry install rfd3 --checkpoint-dir /path/to/ckpt/dir
|
||||
```
|
||||
|
||||
Files for RFD3 exist under this folder (`models/rfd3`), and wrap around the components of RF3 under `src/foundry/`.
|
||||
```
|
||||
chmod +x src/foundry/*.py
|
||||
```
|
||||
|
||||
## Inference:
|
||||
## Run Inference
|
||||
```bash
|
||||
cur_ckpt=rfd3_foundry_2025_12_01.ckpt
|
||||
```
|
||||
|
||||
To run inference
|
||||
```bash
|
||||
uv run python models/rfd3/src/rfd3/run_inference.py out_dir=logs/inference_outs/demo/0 ckpt_path=$cur_ckpt inputs=models/rfd3/docs/demo.json print_config=True dump_trajectories=True
|
||||
rfd3 design out_dir=logs/inference_outs/demo/0 inputs=models/rfd3/docs/demo.json ckpt_path=$cur_ckpt
|
||||
```
|
||||
|
||||
> [!NOTE]
|
||||
@@ -97,21 +64,54 @@ For full details on how to specify inputs, see the [input specification document
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
## Training (w & w/o WandB): #TODO make sure correct
|
||||
## Installation and Setup for Development and Training
|
||||
### A. Installation using `uv`
|
||||
```bash
|
||||
git clone https://github.com/RosettaCommons/foundry.git \
|
||||
&& cd foundry \
|
||||
&& uv python install 3.12 \
|
||||
&& uv venv --python 3.12 \
|
||||
&& source .venv/bin/activate \
|
||||
&& uv pip install -e ".[rfd3]"
|
||||
```
|
||||
<!--
|
||||
> [!IMPORTANT]
|
||||
> You must install `foundry` (the root package) with `-e` first, then install `rfd3`. This ensures both packages are in editable mode for proper development workflow.
|
||||
-->
|
||||
> [!NOTE]
|
||||
> optionally make installed venv available as ipynb kernel (helpful for running examples in `examples/all.ipynb`)
|
||||
`python -m ipykernel install --user --name=foundry --display-name "foundry"`
|
||||
Download checkpoints.
|
||||
```bash
|
||||
foundry install rfd3 --checkpoint-dir /path/to/checkpoint/
|
||||
```
|
||||
|
||||
Add `export PROJECT_PATH=$(pwd)/models/rfd3` to `scripts/slurm/launch.sh`, where `$(pwd)` is the repositories' absolute path
|
||||
You will also want to add your atomworks and foundry (`$(pwd)`) paths to `launch.sh`.
|
||||
## Inference:
|
||||
```bash
|
||||
cur_ckpt=rfd3_foundry_2025_12_01.ckpt
|
||||
```
|
||||
|
||||
To run inference
|
||||
```bash
|
||||
rfd3 design out_dir=logs/inference_outs/demo/0 inputs=models/rfd3/docs/demo.json ckpt_path=$cur_ckpt dump_trajectories=True
|
||||
```
|
||||
|
||||
> [!NOTE]
|
||||
> This demo will take a very long amount of time if run on a
|
||||
> CPU instead of a GPU. On a GPU, this should take on the
|
||||
> order of 10 minutes.
|
||||
|
||||
The output directory will automatically be created.
|
||||
|
||||
For full details on how to specify inputs, see the [input specification documentation](./docs/input.md). You can also see `models/rfd3/configs/inference_engine/rfdiffusion3.yaml`.
|
||||
|
||||
## Training (w & w/o WandB): #TODO make sure correct
|
||||
|
||||
To launch a training run, use:
|
||||
```
|
||||
sbatch -J rfd3-full-sparse launch.sh
|
||||
```
|
||||
|
||||
Optionally ensure your `WANDB_API_KEY` is an environment variable. You can disable wandb by including the following at the top of your experiment config:
|
||||
```yaml
|
||||
defaults:
|
||||
- override /logger: csv # turns off wandb logger
|
||||
uv run python models/rfd3/src/rfd3/train.py experiment=pretrain
|
||||
```
|
||||
See the paths [configs](/models/rfd3/configs/paths/) to customize the paths where data is read from and where logs are written. There is also a wandb config that can be enabled if you want to log training through wandb.
|
||||
|
||||
### Install HBPLUS for training with hydrogen bond conditioning:
|
||||
|
||||
|
||||
@@ -2,24 +2,10 @@
|
||||
pdb_data_dir: /projects/ml/frozen_pdb_copies/2025_07_13_pdb
|
||||
pdb_parquet_dir: /projects/ml/datahub/dfs/af3_splits/2024_12_16/ # TODO: uncomment
|
||||
|
||||
# fb monomer distillation dataset
|
||||
# monomer distillation dataset
|
||||
monomer_distillation_data_dir: /squash/af2_distillation_facebook/
|
||||
monomer_distillation_parquet_dir: /projects/ml/datahub/dfs/distillation/af2_distillation_facebook
|
||||
|
||||
# path(s) to search for protein MSAs (for PDB datasets)
|
||||
protein_msa_dirs:
|
||||
- {"dir": "/projects/msa/rf2aa_af3/rf2aa_paper_model_protein_msas", "extension": ".a3m.gz", "directory_depth": 2}
|
||||
- {"dir": "/projects/msa/rf2aa_af3/missing_msas_through_2024_08_12", "extension": ".msa0.a3m.gz", "directory_depth": 2}
|
||||
- {"dir": "/net/scratch/mkazman/msa/validate_no_leak_taxid", "extension": ".a3m.gz", "directory_depth": 2}
|
||||
- {"dir": "/net/scratch/mkazman/msa/missing_antibody_msas", "extension": ".a3m.gz", "directory_depth": 2}
|
||||
- {"dir": "/net/scratch/mkazman/msa/post_training_cutoff_msas/processed_nested", "extension": ".a3m.gz", "directory_depth": 2}
|
||||
- {"dir": "/net/scratch/mkazman/msa/post_training_cutoff_msas/extra_seqs_processed_nested", "extension": ".a3m.gz", "directory_depth": 2}
|
||||
- {"dir": "/projects/msa/nvidia_renamed_with_seq_hash/maxseq_10k", "extension": ".a3m.gz", "directory_depth": 2}
|
||||
|
||||
# path(s) to search for RNA MSAs
|
||||
rna_msa_dirs:
|
||||
- {"dir": "/projects/msa/rf2aa_af3/rf2aa_paper_model_rna_msas", "extension": ".afa", "directory_depth": 0}
|
||||
|
||||
# path to save examples that fail during the Transform pipeline (null = do not save)
|
||||
failed_examples_dir: null
|
||||
|
||||
|
||||
@@ -23,11 +23,11 @@
|
||||
|
||||
---
|
||||
|
||||
## What changed (high level)
|
||||
## How it works (high level)
|
||||
|
||||
- **Unified selections.** All per-residue/atom choices now use **InputSelection**:
|
||||
- You can pass `true`/`false`, a **contig string** (`"A1-10,B5-8"`), or a **dictionary** (`{"A1-10": "ALL", "B5": "N,CA,C,O"}`).
|
||||
- New selection fields include: `select_fixed_atoms`, `select_unfixed_sequence`, `select_buried`, `select_partially_buried`, `select_exposed`, `select_hbond_donor`, `select_hbond_acceptor`, `select_hotspots`.
|
||||
- Selection fields include: `select_fixed_atoms`, `select_unfixed_sequence`, `select_buried`, `select_partially_buried`, `select_exposed`, `select_hbond_donor`, `select_hbond_acceptor`, `select_hotspots`.
|
||||
- **Clearer unindexing.** For **unindexed** motifs you typically either fix `"ALL"` atoms or explicitly choose subsets such as `"TIP"`/`"BKBN"`/explicit atom lists via a **dictionary** (see examples).
|
||||
When using `unindex`, only **the atoms you mark as fixed** are carried over from the input.
|
||||
- **Reproducibility.** The exact specification and the **sampled contig** are logged back into the output JSON. We also log useful counts (atoms, residues, chains).
|
||||
@@ -138,23 +138,44 @@ lists multiple unindexed components; internal “breakpoints” are inferred and
|
||||
|
||||
# Appendix
|
||||
## FAQ / gotchas
|
||||
<details>
|
||||
<summary><b>Do I need select_fixed_atoms & select_unfixed_sequence every time?</b></summary>
|
||||
|
||||
Q: Do I need select_fixed_atoms & select_unfixed_sequence every time?
|
||||
A: No. Defaults apply when input present.
|
||||
No. Defaults apply when input present.
|
||||
</details>
|
||||
|
||||
Q: What does "ALL" vs "TIP" in unindex mean?
|
||||
A: "ALL" → copy full residue; "TIP" → fix only sidechain tip atoms.
|
||||
<details>
|
||||
<summary><b>Do I need select_fixed_atoms & select_unfixed_sequence every time?</b></summary>
|
||||
|
||||
Q: Can selections overlap?
|
||||
A: Only certain ones (fixed vs unfixed) may; RASA & donor/acceptor cannot.
|
||||
No. Defaults apply when input present.
|
||||
</details>
|
||||
|
||||
Q: How to fix backbone but redesign sidechains?
|
||||
A: redesign_motif_sidechains: true.
|
||||
<details>
|
||||
<summary><b>What does "ALL" vs "TIP" in unindex mean?</b></summary>
|
||||
|
||||
Q: Why “Input provided but unused”?
|
||||
A: You gave input but no contig, unindex, or partial_t.
|
||||
- **`ALL`** → copy full residue
|
||||
- **`TIP`** → fix only sidechain tip atoms
|
||||
</details>
|
||||
|
||||
## Shorthand atoms
|
||||
<details>
|
||||
<summary><b>Can selections overlap?</b></summary>
|
||||
|
||||
Only certain ones (fixed vs unfixed) may; RASA & donor/acceptor cannot.
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary><b>How to fix backbone but redesign sidechains?</b></summary>
|
||||
|
||||
`redesign_motif_sidechains: true`
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary><b>Why "Input provided but unused"?</b></summary>
|
||||
|
||||
You gave input but no contig, unindex, or partial_t.
|
||||
</details>
|
||||
|
||||
## Shorthand atoms for easy specification
|
||||
Keyword Expands to
|
||||
BKBN N, CA, C, O
|
||||
TIP Residue-specific “tip” atoms
|
||||
|
||||
@@ -1,5 +1,9 @@
|
||||
# RFdiffusion3 — Protein binder design examples
|
||||
|
||||
RFD3 is a highly proficient protein binder designer. The following arguments have to be specified to RFD3 to make protein binders.
|
||||
- input: the PDB or CIF file of the structure you want to bind
|
||||
- contig: the length range of the binder to make (indicated as a range) and which residues from the target file to consider.
|
||||
- infer_ori_strategy: how rfd3 decides to place the origin of the generated protein binder with respect to the target. We find that using the "hotspots" strategy works best
|
||||
- select_hotspots: which atoms on the target should be bound (dictionary of residues on the target and atoms in those residues)
|
||||
```json
|
||||
|
||||
{
|
||||
|
||||
@@ -1,6 +1,13 @@
|
||||
# RFdiffusion3 — Small molecule binder design examples
|
||||
|
||||
### small molecule binder examples against the ligand IAI with different RASA conditioning
|
||||
RFD3 is also capable of designing small molecule binding proteins. Here are some inputs that could be useful:
|
||||
- input: a PDB or CIF file that has the small molecule that is to be bound
|
||||
- ligand: the 3 letter code in the file that is the ligand to be bound
|
||||
- length: how long the generated protein should be (can be a range)
|
||||
- select_fixed_atoms: selecting which atoms in the ligand should be fixed to the coordinates in the PDB
|
||||
- select_exposed: selecting which atoms in the ligand should be given as exposed to the model
|
||||
- select_buried: selecting which atoms in the ligand should be given as buried to the model
|
||||
|
||||
```json
|
||||
{
|
||||
|
||||
Reference in New Issue
Block a user