initial progress on documentation cleanup (#722)

* initial progress on documentation cleanup * formatting input.md * docs: more documentation for ppi and sm binder design * fix typo --------- Co-authored-by: Rohith Krishna <rohith@localhost>
2026-06-04 13:24:22 +08:00 · 2025-12-02 23:23:02 -08:00
parent ed796ec622
commit 0ff792f718
6 changed files with 100 additions and 107 deletions
--- a/README.md
+++ b/README.md
@@ -57,40 +57,13 @@ We include details DNA, Ligands, Protein-Protein Interaction, Symmetry-condition
 - **foundry**: Model architectures, training, inference endpoints
 - **models/\<model\>:** Released models.

-### Debugging
-
-VSCode-native debugging with Apptainers:
-
-1. Add to `.vscode/launch.json`:
-```json
-{
-    "name": "Python: Attach",
-    "type": "debugpy",
-    "request": "attach",
-    "connect": {
-        "host": "localhost",
-        "port": 2345
-    }
-}
-```
-
-2. Set breakpoints in VSCode
-
-3. Launch with debug port:
-```bash
-export DEBUG_PORT=2345
-./train.py experiment=...
-```
-
-4. Attach debugger when prompted (F5)
-
 #### For Core Developers (Multiple Packages)

 Install both `foundry` and models in editable mode for development:

 ```bash
 # Install foundry and RF3 in editable mode
-uv pip install -e . -e ./models/rf3
+uv pip install -e . -e ./models/rf3 -e ./models/rfd3 -e ./models/mpnn

 # Or install only foundry (no models)
 uv pip install -e .
@@ -165,3 +138,5 @@ If you use this repository code or data in your work, please cite the relavant w
  publisher={Nature Publishing Group US New York}
 }
 ```
+## Acknowledgments
+We thank Rachel Clune and Hope Woods from the RosettaCommons for their collaboration on the codebase, documentation, tutorials and examples. 
--- a/models/rfd3/README.md
+++ b/models/rfd3/README.md
@@ -4,57 +4,24 @@
  <img src="docs/.assets/trajectory.png" alt="All-atom diffusion with RFD3">
 </p>

-
-##  Installation, Setup, and a Basic Design
-### A. Installation using `uv`
+## Get Started
+1. Install RFdiffusion3. See [Main README](../../README.md) for instructions how to install all models to run full pipeline (recommended). If you have already installed all the models skip [here](#run-inference). 
 ```bash
-git clone https://github.com/RosettaCommons/foundry.git \
-  && cd foundry \
-  && uv python install 3.12 \
-  && uv venv --python 3.12 \
-  && source .venv/bin/activate \
-  && uv pip install -e ".[rfd3]"
+pip install rc-foundry[rfd3]
 ```
-<!--
-> [!IMPORTANT]
-> You must install `foundry` (the root package) with `-e` first, then install `rfd3`. This ensures both packages are in editable mode for proper development workflow.
-->
-> [!NOTE]
-> optionally make installed venv available as ipynb kernel (helpful for running examples in `examples/all.ipynb`)
-`python -m ipykernel install --user --name=foundry --display-name "foundry"`
-
-### B. Download model weights for RFD3
+2. Download checkpoint to your desired checkpoint location.
 ```bash
-wget http://files.ipd.uw.edu/pub/rfd3/rfd3_foundry_2025_12_01.ckpt
-```
-*You can store these weights anywhere you would like,
-but if you do not store them in the root directory
-you will need to change the `cur_ckpt` variable discussed
-later on.*
-
-**Setup**
-```bash
-export PROJECT_PATH="$(pwd)/models/rfd3/src:$(pwd)/src:$(pwd)/lib/atomworks/src"
-```
-If your virtual environment is not already active you will 
-also need to run:
-```
-source .venv/bin/activate
+foundry install rfd3 --checkpoint-dir /path/to/ckpt/dir
 ```

-Files for RFD3 exist under this folder (`models/rfd3`), and wrap around the components of RF3 under `src/foundry/`. 
-```
-chmod +x src/foundry/*.py
-```
-
-## Inference:
+## Run Inference
 ```bash 
 cur_ckpt=rfd3_foundry_2025_12_01.ckpt
 ```

 To run inference
 ```bash
-uv run python models/rfd3/src/rfd3/run_inference.py out_dir=logs/inference_outs/demo/0 ckpt_path=$cur_ckpt inputs=models/rfd3/docs/demo.json print_config=True dump_trajectories=True
+rfd3 design out_dir=logs/inference_outs/demo/0 inputs=models/rfd3/docs/demo.json ckpt_path=$cur_ckpt
 ```

 > [!NOTE]
@@ -97,21 +64,54 @@ For full details on how to specify inputs, see the [input specification document
  </tr>
 </table>

-## Training (w & w/o WandB): #TODO make sure correct
+##  Installation and Setup for Development and Training
+### A. Installation using `uv`
+```bash
+git clone https://github.com/RosettaCommons/foundry.git \
+  && cd foundry \
+  && uv python install 3.12 \
+  && uv venv --python 3.12 \
+  && source .venv/bin/activate \
+  && uv pip install -e ".[rfd3]"
+```
+<!--
+> [!IMPORTANT]
+> You must install `foundry` (the root package) with `-e` first, then install `rfd3`. This ensures both packages are in editable mode for proper development workflow.
+-->
+> [!NOTE]
+> optionally make installed venv available as ipynb kernel (helpful for running examples in `examples/all.ipynb`)
+`python -m ipykernel install --user --name=foundry --display-name "foundry"`
+Download checkpoints.
+```bash
+foundry install rfd3 --checkpoint-dir /path/to/checkpoint/
+```

-Add `export PROJECT_PATH=$(pwd)/models/rfd3` to `scripts/slurm/launch.sh`, where `$(pwd)` is the repositories' absolute path
-You will also want to add your atomworks and foundry (`$(pwd)`) paths to `launch.sh`.
+## Inference:
+```bash 
+cur_ckpt=rfd3_foundry_2025_12_01.ckpt
+```
+
+To run inference
+```bash
+rfd3 design out_dir=logs/inference_outs/demo/0 inputs=models/rfd3/docs/demo.json ckpt_path=$cur_ckpt dump_trajectories=True
+```
+
+> [!NOTE]
+> This demo will take a very long amount of time if run on a
+> CPU instead of a GPU. On a GPU, this should take on the
+> order of 10 minutes.
+
+The output directory will automatically be created.
+
+For full details on how to specify inputs, see the [input specification documentation](./docs/input.md). You can also see `models/rfd3/configs/inference_engine/rfdiffusion3.yaml`.
+
+## Training (w & w/o WandB): #TODO make sure correct

 To launch a training run, use:
 ```
-sbatch -J rfd3-full-sparse launch.sh
-```
-
-Optionally ensure your `WANDB_API_KEY` is an environment variable. You can disable wandb by including the following at the top of your experiment config:
-```yaml
-defaults:
-  - override /logger: csv  # turns off wandb logger
+uv run python models/rfd3/src/rfd3/train.py experiment=pretrain
 ```
+See the paths [configs](/models/rfd3/configs/paths/) to customize the paths where data is read from and where logs are written. There is also a wandb config that can be enabled if you want to log training through wandb. 

 ### Install HBPLUS for training with hydrogen bond conditioning:

--- a/models/rfd3/configs/paths/data/default.yaml
+++ b/models/rfd3/configs/paths/data/default.yaml
@@ -2,24 +2,10 @@
 pdb_data_dir: /projects/ml/frozen_pdb_copies/2025_07_13_pdb
 pdb_parquet_dir: /projects/ml/datahub/dfs/af3_splits/2024_12_16/  # TODO: uncomment

-# fb monomer distillation dataset
+# monomer distillation dataset
 monomer_distillation_data_dir: /squash/af2_distillation_facebook/
 monomer_distillation_parquet_dir: /projects/ml/datahub/dfs/distillation/af2_distillation_facebook

-# path(s) to search for protein MSAs (for PDB datasets)
-protein_msa_dirs:
-  - {"dir": "/projects/msa/rf2aa_af3/rf2aa_paper_model_protein_msas", "extension": ".a3m.gz", "directory_depth": 2}
-  - {"dir": "/projects/msa/rf2aa_af3/missing_msas_through_2024_08_12", "extension": ".msa0.a3m.gz", "directory_depth": 2}
-  - {"dir": "/net/scratch/mkazman/msa/validate_no_leak_taxid", "extension": ".a3m.gz", "directory_depth": 2}
-  - {"dir": "/net/scratch/mkazman/msa/missing_antibody_msas", "extension": ".a3m.gz", "directory_depth": 2}
-  - {"dir": "/net/scratch/mkazman/msa/post_training_cutoff_msas/processed_nested", "extension": ".a3m.gz", "directory_depth": 2}
-  - {"dir": "/net/scratch/mkazman/msa/post_training_cutoff_msas/extra_seqs_processed_nested", "extension": ".a3m.gz", "directory_depth": 2}
-  - {"dir": "/projects/msa/nvidia_renamed_with_seq_hash/maxseq_10k", "extension": ".a3m.gz", "directory_depth": 2}
-
-# path(s) to search for RNA MSAs
-rna_msa_dirs:
-  - {"dir": "/projects/msa/rf2aa_af3/rf2aa_paper_model_rna_msas", "extension": ".afa", "directory_depth": 0}
-
 # path to save examples that fail during the Transform pipeline (null = do not save)
 failed_examples_dir: null

--- a/models/rfd3/docs/input.md
+++ b/models/rfd3/docs/input.md
@@ -23,11 +23,11 @@

 ---

-## What changed (high level)
+## How it works (high level)

 - **Unified selections.** All per-residue/atom choices now use **InputSelection**:
  - You can pass `true`/`false`, a **contig string** (`"A1-10,B5-8"`), or a **dictionary** (`{"A1-10": "ALL", "B5": "N,CA,C,O"}`).
-  - New selection fields include: `select_fixed_atoms`, `select_unfixed_sequence`, `select_buried`, `select_partially_buried`, `select_exposed`, `select_hbond_donor`, `select_hbond_acceptor`, `select_hotspots`.
+  - Selection fields include: `select_fixed_atoms`, `select_unfixed_sequence`, `select_buried`, `select_partially_buried`, `select_exposed`, `select_hbond_donor`, `select_hbond_acceptor`, `select_hotspots`.
 - **Clearer unindexing.** For **unindexed** motifs you typically either fix `"ALL"` atoms or explicitly choose subsets such as `"TIP"`/`"BKBN"`/explicit atom lists via a **dictionary** (see examples).  
  When using `unindex`, only **the atoms you mark as fixed** are carried over from the input.
 - **Reproducibility.** The exact specification and the **sampled contig** are logged back into the output JSON. We also log useful counts (atoms, residues, chains).
@@ -138,23 +138,44 @@ lists multiple unindexed components; internal “breakpoints” are inferred and

 # Appendix
 ## FAQ / gotchas
+<details>
+  <summary><b>Do I need select_fixed_atoms & select_unfixed_sequence every time?</b></summary>

-Q: Do I need select_fixed_atoms & select_unfixed_sequence every time?
-A: No. Defaults apply when input present.
+  No. Defaults apply when input present.
+  </details>

-Q: What does "ALL" vs "TIP" in unindex mean?
-A: "ALL" → copy full residue; "TIP" → fix only sidechain tip atoms.
+<details>
+  <summary><b>Do I need select_fixed_atoms & select_unfixed_sequence every time?</b></summary>

-Q: Can selections overlap?
-A: Only certain ones (fixed vs unfixed) may; RASA & donor/acceptor cannot.
+  No. Defaults apply when input present.
+  </details>

-Q: How to fix backbone but redesign sidechains?
-A: redesign_motif_sidechains: true.
+  <details>
+  <summary><b>What does "ALL" vs "TIP" in unindex mean?</b></summary>

-Q: Why “Input provided but unused”?
-A: You gave input but no contig, unindex, or partial_t.
+  - **`ALL`** → copy full residue
+  - **`TIP`** → fix only sidechain tip atoms
+  </details>

-## Shorthand atoms
+  <details>
+  <summary><b>Can selections overlap?</b></summary>
+
+  Only certain ones (fixed vs unfixed) may; RASA & donor/acceptor cannot.
+  </details>
+
+  <details>
+  <summary><b>How to fix backbone but redesign sidechains?</b></summary>
+
+  `redesign_motif_sidechains: true`
+  </details>
+
+  <details>
+  <summary><b>Why "Input provided but unused"?</b></summary>
+
+  You gave input but no contig, unindex, or partial_t.
+  </details>
+
+## Shorthand atoms for easy specification
 Keyword	Expands to
 BKBN	N, CA, C, O
 TIP	Residue-specific “tip” atoms
--- a/models/rfd3/docs/protein_binder_design.md
+++ b/models/rfd3/docs/protein_binder_design.md
@@ -1,5 +1,9 @@
 # RFdiffusion3 — Protein binder design examples
-
+RFD3 is a highly proficient protein binder designer. The following arguments have to be specified to RFD3 to make protein binders.
+- input: the PDB or CIF file of the structure you want to bind
+- contig: the length range of the binder to make (indicated as a range) and which residues from the target file to consider. 
+- infer_ori_strategy: how rfd3 decides to place the origin of the generated protein binder with respect to the target. We find that using the "hotspots" strategy works best
+- select_hotspots: which atoms on the target should be bound (dictionary of residues on the target and atoms in those residues)
 ```json

 {
--- a/models/rfd3/docs/sm_binder_design.md
+++ b/models/rfd3/docs/sm_binder_design.md
@@ -1,6 +1,13 @@
 # RFdiffusion3 — Small molecule binder design examples

 ### small molecule binder examples against the ligand IAI with different RASA conditioning
+RFD3 is also capable of designing small molecule binding proteins. Here are some inputs that could be useful:
+- input: a PDB or CIF file that has the small molecule that is to be bound
+- ligand: the 3 letter code in the file that is the ligand to be bound
+- length: how long the generated protein should be (can be a range)
+- select_fixed_atoms: selecting which atoms in the ligand should be fixed to the coordinates in the PDB
+- select_exposed: selecting which atoms in the ligand should be given as exposed to the model
+- select_buried: selecting which atoms in the ligand should be given as buried to the model

 ```json
 {