initial progress on documentation cleanup (#722)

* initial progress on documentation cleanup

* formatting input.md

* docs: more documentation for ppi and sm binder design

* fix typo

---------

Co-authored-by: Rohith Krishna <rohith@localhost>
This commit is contained in:
Rohith Krishna
2025-12-02 23:23:02 -08:00
committed by GitHub
parent ed796ec622
commit 0ff792f718
6 changed files with 100 additions and 107 deletions

View File

@@ -57,40 +57,13 @@ We include details DNA, Ligands, Protein-Protein Interaction, Symmetry-condition
- **foundry**: Model architectures, training, inference endpoints
- **models/\<model\>:** Released models.
### Debugging
VSCode-native debugging with Apptainers:
1. Add to `.vscode/launch.json`:
```json
{
"name": "Python: Attach",
"type": "debugpy",
"request": "attach",
"connect": {
"host": "localhost",
"port": 2345
}
}
```
2. Set breakpoints in VSCode
3. Launch with debug port:
```bash
export DEBUG_PORT=2345
./train.py experiment=...
```
4. Attach debugger when prompted (F5)
#### For Core Developers (Multiple Packages)
Install both `foundry` and models in editable mode for development:
```bash
# Install foundry and RF3 in editable mode
uv pip install -e . -e ./models/rf3
uv pip install -e . -e ./models/rf3 -e ./models/rfd3 -e ./models/mpnn
# Or install only foundry (no models)
uv pip install -e .
@@ -165,3 +138,5 @@ If you use this repository code or data in your work, please cite the relavant w
publisher={Nature Publishing Group US New York}
}
```
## Acknowledgments
We thank Rachel Clune and Hope Woods from the RosettaCommons for their collaboration on the codebase, documentation, tutorials and examples.

View File

@@ -4,57 +4,24 @@
<img src="docs/.assets/trajectory.png" alt="All-atom diffusion with RFD3">
</p>
## Installation, Setup, and a Basic Design
### A. Installation using `uv`
## Get Started
1. Install RFdiffusion3. See [Main README](../../README.md) for instructions how to install all models to run full pipeline (recommended). If you have already installed all the models skip [here](#run-inference).
```bash
git clone https://github.com/RosettaCommons/foundry.git \
&& cd foundry \
&& uv python install 3.12 \
&& uv venv --python 3.12 \
&& source .venv/bin/activate \
&& uv pip install -e ".[rfd3]"
pip install rc-foundry[rfd3]
```
<!--
> [!IMPORTANT]
> You must install `foundry` (the root package) with `-e` first, then install `rfd3`. This ensures both packages are in editable mode for proper development workflow.
-->
> [!NOTE]
> optionally make installed venv available as ipynb kernel (helpful for running examples in `examples/all.ipynb`)
`python -m ipykernel install --user --name=foundry --display-name "foundry"`
### B. Download model weights for RFD3
2. Download checkpoint to your desired checkpoint location.
```bash
wget http://files.ipd.uw.edu/pub/rfd3/rfd3_foundry_2025_12_01.ckpt
```
*You can store these weights anywhere you would like,
but if you do not store them in the root directory
you will need to change the `cur_ckpt` variable discussed
later on.*
**Setup**
```bash
export PROJECT_PATH="$(pwd)/models/rfd3/src:$(pwd)/src:$(pwd)/lib/atomworks/src"
```
If your virtual environment is not already active you will
also need to run:
```
source .venv/bin/activate
foundry install rfd3 --checkpoint-dir /path/to/ckpt/dir
```
Files for RFD3 exist under this folder (`models/rfd3`), and wrap around the components of RF3 under `src/foundry/`.
```
chmod +x src/foundry/*.py
```
## Inference:
## Run Inference
```bash
cur_ckpt=rfd3_foundry_2025_12_01.ckpt
```
To run inference
```bash
uv run python models/rfd3/src/rfd3/run_inference.py out_dir=logs/inference_outs/demo/0 ckpt_path=$cur_ckpt inputs=models/rfd3/docs/demo.json print_config=True dump_trajectories=True
rfd3 design out_dir=logs/inference_outs/demo/0 inputs=models/rfd3/docs/demo.json ckpt_path=$cur_ckpt
```
> [!NOTE]
@@ -97,21 +64,54 @@ For full details on how to specify inputs, see the [input specification document
</tr>
</table>
## Training (w & w/o WandB): #TODO make sure correct
## Installation and Setup for Development and Training
### A. Installation using `uv`
```bash
git clone https://github.com/RosettaCommons/foundry.git \
&& cd foundry \
&& uv python install 3.12 \
&& uv venv --python 3.12 \
&& source .venv/bin/activate \
&& uv pip install -e ".[rfd3]"
```
<!--
> [!IMPORTANT]
> You must install `foundry` (the root package) with `-e` first, then install `rfd3`. This ensures both packages are in editable mode for proper development workflow.
-->
> [!NOTE]
> optionally make installed venv available as ipynb kernel (helpful for running examples in `examples/all.ipynb`)
`python -m ipykernel install --user --name=foundry --display-name "foundry"`
Download checkpoints.
```bash
foundry install rfd3 --checkpoint-dir /path/to/checkpoint/
```
Add `export PROJECT_PATH=$(pwd)/models/rfd3` to `scripts/slurm/launch.sh`, where `$(pwd)` is the repositories' absolute path
You will also want to add your atomworks and foundry (`$(pwd)`) paths to `launch.sh`.
## Inference:
```bash
cur_ckpt=rfd3_foundry_2025_12_01.ckpt
```
To run inference
```bash
rfd3 design out_dir=logs/inference_outs/demo/0 inputs=models/rfd3/docs/demo.json ckpt_path=$cur_ckpt dump_trajectories=True
```
> [!NOTE]
> This demo will take a very long amount of time if run on a
> CPU instead of a GPU. On a GPU, this should take on the
> order of 10 minutes.
The output directory will automatically be created.
For full details on how to specify inputs, see the [input specification documentation](./docs/input.md). You can also see `models/rfd3/configs/inference_engine/rfdiffusion3.yaml`.
## Training (w & w/o WandB): #TODO make sure correct
To launch a training run, use:
```
sbatch -J rfd3-full-sparse launch.sh
```
Optionally ensure your `WANDB_API_KEY` is an environment variable. You can disable wandb by including the following at the top of your experiment config:
```yaml
defaults:
- override /logger: csv # turns off wandb logger
uv run python models/rfd3/src/rfd3/train.py experiment=pretrain
```
See the paths [configs](/models/rfd3/configs/paths/) to customize the paths where data is read from and where logs are written. There is also a wandb config that can be enabled if you want to log training through wandb.
### Install HBPLUS for training with hydrogen bond conditioning:

View File

@@ -2,24 +2,10 @@
pdb_data_dir: /projects/ml/frozen_pdb_copies/2025_07_13_pdb
pdb_parquet_dir: /projects/ml/datahub/dfs/af3_splits/2024_12_16/ # TODO: uncomment
# fb monomer distillation dataset
# monomer distillation dataset
monomer_distillation_data_dir: /squash/af2_distillation_facebook/
monomer_distillation_parquet_dir: /projects/ml/datahub/dfs/distillation/af2_distillation_facebook
# path(s) to search for protein MSAs (for PDB datasets)
protein_msa_dirs:
- {"dir": "/projects/msa/rf2aa_af3/rf2aa_paper_model_protein_msas", "extension": ".a3m.gz", "directory_depth": 2}
- {"dir": "/projects/msa/rf2aa_af3/missing_msas_through_2024_08_12", "extension": ".msa0.a3m.gz", "directory_depth": 2}
- {"dir": "/net/scratch/mkazman/msa/validate_no_leak_taxid", "extension": ".a3m.gz", "directory_depth": 2}
- {"dir": "/net/scratch/mkazman/msa/missing_antibody_msas", "extension": ".a3m.gz", "directory_depth": 2}
- {"dir": "/net/scratch/mkazman/msa/post_training_cutoff_msas/processed_nested", "extension": ".a3m.gz", "directory_depth": 2}
- {"dir": "/net/scratch/mkazman/msa/post_training_cutoff_msas/extra_seqs_processed_nested", "extension": ".a3m.gz", "directory_depth": 2}
- {"dir": "/projects/msa/nvidia_renamed_with_seq_hash/maxseq_10k", "extension": ".a3m.gz", "directory_depth": 2}
# path(s) to search for RNA MSAs
rna_msa_dirs:
- {"dir": "/projects/msa/rf2aa_af3/rf2aa_paper_model_rna_msas", "extension": ".afa", "directory_depth": 0}
# path to save examples that fail during the Transform pipeline (null = do not save)
failed_examples_dir: null

View File

@@ -23,11 +23,11 @@
---
## What changed (high level)
## How it works (high level)
- **Unified selections.** All per-residue/atom choices now use **InputSelection**:
- You can pass `true`/`false`, a **contig string** (`"A1-10,B5-8"`), or a **dictionary** (`{"A1-10": "ALL", "B5": "N,CA,C,O"}`).
- New selection fields include: `select_fixed_atoms`, `select_unfixed_sequence`, `select_buried`, `select_partially_buried`, `select_exposed`, `select_hbond_donor`, `select_hbond_acceptor`, `select_hotspots`.
- Selection fields include: `select_fixed_atoms`, `select_unfixed_sequence`, `select_buried`, `select_partially_buried`, `select_exposed`, `select_hbond_donor`, `select_hbond_acceptor`, `select_hotspots`.
- **Clearer unindexing.** For **unindexed** motifs you typically either fix `"ALL"` atoms or explicitly choose subsets such as `"TIP"`/`"BKBN"`/explicit atom lists via a **dictionary** (see examples).
When using `unindex`, only **the atoms you mark as fixed** are carried over from the input.
- **Reproducibility.** The exact specification and the **sampled contig** are logged back into the output JSON. We also log useful counts (atoms, residues, chains).
@@ -138,23 +138,44 @@ lists multiple unindexed components; internal “breakpoints” are inferred and
# Appendix
## FAQ / gotchas
<details>
<summary><b>Do I need select_fixed_atoms & select_unfixed_sequence every time?</b></summary>
Q: Do I need select_fixed_atoms & select_unfixed_sequence every time?
A: No. Defaults apply when input present.
No. Defaults apply when input present.
</details>
Q: What does "ALL" vs "TIP" in unindex mean?
A: "ALL" → copy full residue; "TIP" → fix only sidechain tip atoms.
<details>
<summary><b>Do I need select_fixed_atoms & select_unfixed_sequence every time?</b></summary>
Q: Can selections overlap?
A: Only certain ones (fixed vs unfixed) may; RASA & donor/acceptor cannot.
No. Defaults apply when input present.
</details>
Q: How to fix backbone but redesign sidechains?
A: redesign_motif_sidechains: true.
<details>
<summary><b>What does "ALL" vs "TIP" in unindex mean?</b></summary>
Q: Why “Input provided but unused”?
A: You gave input but no contig, unindex, or partial_t.
- **`ALL`** → copy full residue
- **`TIP`** → fix only sidechain tip atoms
</details>
## Shorthand atoms
<details>
<summary><b>Can selections overlap?</b></summary>
Only certain ones (fixed vs unfixed) may; RASA & donor/acceptor cannot.
</details>
<details>
<summary><b>How to fix backbone but redesign sidechains?</b></summary>
`redesign_motif_sidechains: true`
</details>
<details>
<summary><b>Why "Input provided but unused"?</b></summary>
You gave input but no contig, unindex, or partial_t.
</details>
## Shorthand atoms for easy specification
Keyword Expands to
BKBN N, CA, C, O
TIP Residue-specific “tip” atoms

View File

@@ -1,5 +1,9 @@
# RFdiffusion3 — Protein binder design examples
RFD3 is a highly proficient protein binder designer. The following arguments have to be specified to RFD3 to make protein binders.
- input: the PDB or CIF file of the structure you want to bind
- contig: the length range of the binder to make (indicated as a range) and which residues from the target file to consider.
- infer_ori_strategy: how rfd3 decides to place the origin of the generated protein binder with respect to the target. We find that using the "hotspots" strategy works best
- select_hotspots: which atoms on the target should be bound (dictionary of residues on the target and atoms in those residues)
```json
{

View File

@@ -1,6 +1,13 @@
# RFdiffusion3 — Small molecule binder design examples
### small molecule binder examples against the ligand IAI with different RASA conditioning
RFD3 is also capable of designing small molecule binding proteins. Here are some inputs that could be useful:
- input: a PDB or CIF file that has the small molecule that is to be bound
- ligand: the 3 letter code in the file that is the ligand to be bound
- length: how long the generated protein should be (can be a range)
- select_fixed_atoms: selecting which atoms in the ligand should be fixed to the coordinates in the PDB
- select_exposed: selecting which atoms in the ligand should be given as exposed to the model
- select_buried: selecting which atoms in the ligand should be given as buried to the model
```json
{