Docs: Designability vs. Diversity (#220)

* Docs: Installation FAQ space and minor RFD3 docs updates

Installation FAQ: created a document to specify any common installation issues and questions. Should be continuously updated based on logged issues and questions. Not specific to any model.

RFD3:
- changed the checkpoint files specified in the examples to rfd3_latest.ckpt
- updated information in input.md to clarify information based on recent issues that had been submitted

* Docs: Symlinks for RF3 and MPNN docs, RFD3 README minor edits

RF3 and MPNN: folders, index files, and symlinks were created in order to provide space for eventual RF3 and MPNN docs.

Several small changes in the RFD3 README to improve readability and  add a pointer to the PPI tutorial as a starting point for someone new to RFdiffusion tools.

* First draft of enzyme design tutorial. Minor typo fixes in other documents.

* First draft of nucleic acid binder tutorial, minor edits to the other tutorials

* Completed enzyme design tutorial, removal of NA binder tutorial from index

Made changes based on edits from Saman, added images, and created zip file containing sample outputs for an enzyme design tutorial.

I am waiting on edits for the NA binder design tutorial, so for now I have removed it from the documentation index.

* Removing file related to in-progress NA binder tutorial

* Removing file related to in-progress NA binder tutorial

* Update ppi_design_tutorial.md

- Added information about useful CLI arguments
- Cleaned up the introduction
- Added section for what one might do with the designs from RFD3
- Added a note about hotspot residues also being in the `contig` (information from Rafi's TTT talk)
- Fixed minor sphinx heading issue

* Reorganizing RFD3 documentation

Reorganized files into an `examples` and a `tutorials` folders to clean up the RFD3 docs folder and align its organization with the RF3 docs folder. Any edits made in the files are related to changing the paths to reflect these changes.

* Docs: Designability vs. Diversity document

Created a document describing the settings that can impact the designability and diversity of structures output by RFdiffusion3, the information is based on the talk Rafi gave at Tech Tea Time in January.

* Minor grammar fixes in designability vs diversity document

* Update models/rfd3/docs/tutorials/ppi_design_tutorial.md

Co-authored-by: Rafael Brent <105883594+RafiBrent@users.noreply.github.com>

* Update models/rfd3/docs/tutorials/enzyme_design_tutorial.md

Co-authored-by: Rafael Brent <105883594+RafiBrent@users.noreply.github.com>

* Update models/rfd3/docs/designability_vs_diversity.md

Co-authored-by: Rafael Brent <105883594+RafiBrent@users.noreply.github.com>

* Update models/rfd3/docs/designability_vs_diversity.md

Co-authored-by: Rafael Brent <105883594+RafiBrent@users.noreply.github.com>

---------

Co-authored-by: Jasper Butcher <66851659+Ubiquinone-dot@users.noreply.github.com>
Co-authored-by: Rafael Brent <105883594+RafiBrent@users.noreply.github.com>
This commit is contained in:
Rachel Clune
2026-02-24 10:50:29 -08:00
committed by GitHub
parent b81ccd40d8
commit 4a70f0ef93
35 changed files with 438 additions and 79 deletions

View File

@@ -44,7 +44,7 @@ is working correctly. If you are new to RFdiffusion methods or JSON/YAML structu
To run inference (with foundry installed in your environment, or RFD3 & Foundry src in PYTHONPATH): To run inference (with foundry installed in your environment, or RFD3 & Foundry src in PYTHONPATH):
```bash ```bash
rfd3 design out_dir=logs/inference_outs/demo/0 inputs=models/rfd3/docs/demo.json skip_existing=False dump_trajectories=True prevalidate_inputs=True rfd3 design out_dir=logs/inference_outs/demo/0 inputs=models/rfd3/docs/examples/demo.json skip_existing=False dump_trajectories=True prevalidate_inputs=True
``` ```
To run RFD3, you only need to provide the input (`inputs`) JSON/YAML file (see the [external documentation for more details](https://rosettacommons.github.io/foundry/models/rfd3/index.html#general)) where you specify your design constraints and the output directory (`out_dir`) where you want to store the files RFD3 generates. To run RFD3, you only need to provide the input (`inputs`) JSON/YAML file (see the [external documentation for more details](https://rosettacommons.github.io/foundry/models/rfd3/index.html#general)) where you specify your design constraints and the output directory (`out_dir`) where you want to store the files RFD3 generates.
@@ -64,11 +64,11 @@ For full details on how to specify inputs, see the [input specification document
## Further example JSONs for different applications ## Further example JSONs for different applications
Additional examples are broken up by use case. If you have cloned the Additional examples are broken up by use case. If you have cloned the
repository, matching `.json` files are in `foundry/models/rfd3/docs` repository, matching `.json` files are in `foundry/models/rfd3/docs/examples`
that can be run directly, similar to the previous example. that can be run directly, similar to the previous example.
In the examples the paths to the input files are specified assuming In the examples, the paths to the input files are specified assuming
that you are running the examples from the `foundry/models/rfd3/docs` that you are running the examples from the `foundry/models/rfd3/docs/examples`
directory. If you would like to run RFD3 from a different location, directory. If you would like to run RFD3 from a different location,
you will need to change the path in the `.json` file(s) before running. you will need to change the path in the `.json` file(s) before running.
@@ -170,7 +170,7 @@ further optimization!
In `models/rfd3/configs/datasets/design_base.yaml` there's the shared configs for all datasets under `global_transform_args`. The dials that control the conditioning described above go under `training_conditions`, where for example `tipatom` - a specific preset conditioning sampler which more frequently fixes few tokens with few atoms - and others can be found. In `models/rfd3/configs/datasets/design_base.yaml` there's the shared configs for all datasets under `global_transform_args`. The dials that control the conditioning described above go under `training_conditions`, where for example `tipatom` - a specific preset conditioning sampler which more frequently fixes few tokens with few atoms - and others can be found.
**Training with WandB:** We strongly recommend tracking your runs via wandb. To use it, simply have your WANDB_API_KEY set and use the wandb logger. For more details see [here](wandb.ai) **Training with WandB:** We strongly recommend tracking your runs via wandb. To use it, simply have your WANDB_API_KEY set and use the wandb logger. For more details see [here](https://wandb.ai/site/)
# Appendix # Appendix

Binary file not shown.

After

Width:  |  Height:  |  Size: 45 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 50 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 322 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 468 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 491 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 749 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 368 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 395 KiB

View File

@@ -0,0 +1,50 @@
# Designability vs. Diversity
When using RFdiffusion3 there is a balance between designability and diversity of generated structures. Increasing the diversity of the designs will lead to a greater number of novel folds, however, there will also be a larger portion of structures that have low confidence scores when refolded.
Whether you are struggling to produce designable structures or you are looking to increase the diversity of the folds you see, here are a few settings to try changing:
- **Low temperature sampling:**
One can increase `inference_sampler.step_scale` and decrease `inference_sampler.gamma_0` to decrease the sampling space that RFdiffusion3 has access to, similar to what lowering the temperature does in physics-based design methods. These settings directly change how the RFdiffusion3 inference engine works, so these options are specified in the CLI, and are not options you specify in your input JSON or YAML file.
Here are what these settings do:
- `inference_sampler.step_scale`: Changing this value (default 1.5) changes the diffusion step size, or how much you go towards the most probable result. Increasing this setting will increase the designability of the output structures, as these are more probable, but will also decrease the diversity of the produced structures.
- `inference_sampler.gamma_0`: Changing this value (default 0.6) will change how much noise is added at each step in the inference trajectory. Decreasing this setting will increase the designability of the output structures as the reduced randomness will lead RFdiffusion3 to higher-probability structures. Increase this quantity to increase the diversity of designed structures.
- **`is_non_loopy` setting:**
The `is_non_loopy` setting is a constraint on the designs RFdiffusion3 produces, which makes it a setting provided in a JSON/YAML file. If `True` it biases the model away from forming structures with many regions without a defined secondary structure. This will slightly decrease the diversity of structures that RFdiffusion3 produces while increasing the designability.
Here are a few plots showing the impacts of these settings in protein-protein interface design tasks:
```{note}
For the purposes of the plots below:
* `Low temperature` means a `step_scale` of 3 and a `gamma_0` of 0.2.
* Pass rates are refolding pass rates, the number of backbones that pass after four attempts at designing the sequence using MPNN-based methods.
* 'Cluster' refers to `foldseek-based clusters <https://www.nature.com/articles/s41587-023-01773-0>`_, and the cluster pass rate is the number of clusters represented among the passing designs divided by the total number of designed backbones.
```
```{figure} ./.assets/400bb_rfd3_inference_settings_designability.png
:width: 800px
Impacts of using low temperature settings (inf) and the `is_non_loopy` constraint on the outputs of RFdiffusion3.
```
</br>
---
```{figure} ./.assets/400bb_rfd3_inference_settings_diversity.png
:width: 800px
Diversity of folds in structures designed by RFD3 when using low temperature sampling and the `is_non_loopy` setting.
```
</br>
---
```{figure} ./.assets/400bb_rfd3_inference_settings_secondary_structure.png
:width: 800px
Compares the amount of alpha helices and beta sheets in structures designed by RFD3 when the low temperature sampling and `is_non_loopy` settings are used. The removal of the `is_non_loopy` setting results in a large reduction in α-helices and a small increase in the number of ß-sheets.
```

View File

@@ -1,6 +1,6 @@
{ {
"M0255_1mg5_unfixed": { "M0255_1mg5_unfixed": {
"input": "./input_pdbs/M0255_1mg5.pdb", "input": "../input_pdbs/M0255_1mg5.pdb",
"ligand": "NAI,ACT", "ligand": "NAI,ACT",
"unindex": "A108,A139,A152,A156", "unindex": "A108,A139,A152,A156",
"length": "180-200", "length": "180-200",
@@ -14,7 +14,7 @@
} }
}, },
"partial_diffusion": { "partial_diffusion": {
"input": "./input_pdbs/7v11.pdb", "input": "../input_pdbs/7v11.pdb",
"ligand": "OQO", "ligand": "OQO",
"partial_t": 15.0, "partial_t": 15.0,
"contig": "A431", "contig": "A431",
@@ -26,7 +26,7 @@
} }
}, },
"dsDNA_basic": { "dsDNA_basic": {
"input": "./input_pdbs/1bna.pdb", "input": "../input_pdbs/1bna.pdb",
"contig": "A1-10,/0,B15-24,/0,120-130", "contig": "A1-10,/0,B15-24,/0,120-130",
"length": "140-150", "length": "140-150",
"ori_token": [24,20,10], "ori_token": [24,20,10],

View File

@@ -1,6 +1,6 @@
{ {
"M0255_1mg5_unfixed": { "M0255_1mg5_unfixed": {
"input": "./input_pdbs/M0255_1mg5.pdb", "input": "../input_pdbs/M0255_1mg5.pdb",
"ligand": "NAI,ACT", "ligand": "NAI,ACT",
"unindex": "A108,A139,A152,A156", "unindex": "A108,A139,A152,A156",
"length": "180-200", "length": "180-200",

View File

@@ -28,7 +28,7 @@ The input files for the different examples are provided in `foundry/models/rfd3/
```json ```json
{ {
"M0255_1mg5_unfixed": { "M0255_1mg5_unfixed": {
"input": "./input_pdbs/M0255_1mg5.pdb", "input": "../input_pdbs/M0255_1mg5.pdb",
"ligand": "NAI,ACT", "ligand": "NAI,ACT",
"unindex": "A108,A139,A152,A156", "unindex": "A108,A139,A152,A156",
"length": "180-200", "length": "180-200",

View File

@@ -1,34 +1,34 @@
{ {
"dsDNA_basic": { "dsDNA_basic": {
"input": "./input_pdbs/1bna.pdb", "input": "../input_pdbs/1bna.pdb",
"contig": "A1-10,/0,B15-24,/0,120-130", "contig": "A1-10,/0,B15-24,/0,120-130",
"length": "140-150", "length": "140-150",
"ori_token": [24,20,10], "ori_token": [24,20,10],
"is_non_loopy": true "is_non_loopy": true
}, },
"ssDNA_basic": { "ssDNA_basic": {
"input": "./input_pdbs/5o4d.pdb", "input": "../input_pdbs/5o4d.pdb",
"contig": "A1-23,/0,120-130", "contig": "A1-23,/0,120-130",
"length": "143-153", "length": "143-153",
"ori_token": [-5,-10,8], "ori_token": [-5,-10,8],
"is_non_loopy": true "is_non_loopy": true
}, },
"ssDNA_diffused_from_dsDNA_pdb":{ "ssDNA_diffused_from_dsDNA_pdb":{
"input": "./input_pdbs/1bna.pdb", "input": "../input_pdbs/1bna.pdb",
"contig": "A1-10,/0,120-130", "contig": "A1-10,/0,120-130",
"length": "130-140", "length": "130-140",
"select_fixed_atoms": {"A1-10":""}, "select_fixed_atoms": {"A1-10":""},
"is_non_loopy": true "is_non_loopy": true
}, },
"RNA_basic": { "RNA_basic": {
"input": "./input_pdbs/1q75.pdb", "input": "../input_pdbs/1q75.pdb",
"contig": "A1-15,/0,120-130", "contig": "A1-15,/0,120-130",
"length": "135-145", "length": "135-145",
"ori_token": [15,2,-4], "ori_token": [15,2,-4],
"is_non_loopy": true "is_non_loopy": true
}, },
"dsDNA_complex": { "dsDNA_complex": {
"input": "./input_pdbs/2r5z.pdb", "input": "../input_pdbs/2r5z.pdb",
"contig": "C5-18,/0,D24-37,/0,40-50,A146-154,80-90", "contig": "C5-18,/0,D24-37,/0,40-50,A146-154,80-90",
"length": "157-177", "length": "157-177",
"unindex": "/0,/0,B251-255", "unindex": "/0,/0,B251-255",

View File

@@ -27,7 +27,7 @@ The length attribute should be the sum of all polymer lengths. in this case (120
```json ```json
{ {
"dsDNA_basic": { "dsDNA_basic": {
"input": "./input_pdbs/1bna.pdb", "input": "../input_pdbs/1bna.pdb",
"contig": "A1-10,/0,B15-24,/0,120-130", "contig": "A1-10,/0,B15-24,/0,120-130",
"length": "140-150", "length": "140-150",
"ori_token": [24,20,10], "ori_token": [24,20,10],
@@ -43,7 +43,7 @@ Similar to the previous example, but done for a PDB containing one DNA strand (A
```json ```json
{ {
"ssDNA_basic": { "ssDNA_basic": {
"input": "./input_pdbs/5o4d.pdb", "input": "../input_pdbs/5o4d.pdb",
"contig": "A1-23,/0,120-130", "contig": "A1-23,/0,120-130",
"length": "143-153", "length": "143-153",
"ori_token": [-5,-10,8], "ori_token": [-5,-10,8],
@@ -58,7 +58,7 @@ Similar to the previous example but the input PDB has a dsDNA. One of the chains
```json ```json
{ {
"ssDNA_diffused_from_dsDNA_pdb":{ "ssDNA_diffused_from_dsDNA_pdb":{
"input": "./input_pdbs/1bna.pdb", "input": "../input_pdbs/1bna.pdb",
"contig": "A1-10,/0,120-130", "contig": "A1-10,/0,120-130",
"length": "130-140", "length": "130-140",
"select_fixed_atoms": {"A1-10":""}, "select_fixed_atoms": {"A1-10":""},
@@ -74,7 +74,7 @@ Example on RNA. Similar to the ssDNA example, example 2.
```json ```json
{ {
"RNA_basic": { "RNA_basic": {
"input": "./input_pdbs/1q75.pdb", "input": "../input_pdbs/1q75.pdb",
"contig": "A1-15,/0,120-130", "contig": "A1-15,/0,120-130",
"length": "135-145", "length": "135-145",
"ori_token": [15,2,-4], "ori_token": [15,2,-4],
@@ -96,7 +96,7 @@ To run this without warnings, you will need to install [hbplus](https://www.ebi.
```json ```json
{ {
"dsDNA_complex": { "dsDNA_complex": {
"input": "./input_pdbs/2r5z.pdb", "input": "../input_pdbs/2r5z.pdb",
"contig": "C5-18,/0,D24-37,/0,40-50,A146-154,80-90", "contig": "C5-18,/0,D24-37,/0,40-50,A146-154,80-90",
"length": "147-167", "length": "147-167",
"unindex": "/0,/0,B251-B255", "unindex": "/0,/0,B251-B255",

View File

@@ -2,7 +2,7 @@
"insulinr": { "insulinr": {
"dialect": 2, "dialect": 2,
"infer_ori_strategy": "hotspots", "infer_ori_strategy": "hotspots",
"input": "input_pdbs/4zxb_cropped.pdb", "input": "../input_pdbs/4zxb_cropped.pdb",
"contig": "40-120,/0,E6-155", "contig": "40-120,/0,E6-155",
"select_hotspots": { "select_hotspots": {
"E64": "CD2,CZ", "E64": "CD2,CZ",
@@ -14,7 +14,7 @@
"pdl1": { "pdl1": {
"dialect": 2, "dialect": 2,
"infer_ori_strategy": "hotspots", "infer_ori_strategy": "hotspots",
"input": "input_pdbs/5o45_cropped.pdb", "input": "../input_pdbs/5o45_cropped.pdb",
"contig": "50-120,/0,A17-131", "contig": "50-120,/0,A17-131",
"select_hotspots": { "select_hotspots": {
"A56": "CG,OH", "A56": "CG,OH",

View File

@@ -40,7 +40,7 @@ The input files for the different examples are provided in `foundry/models/rfd3/
"insulinr": { "insulinr": {
"dialect": 2, "dialect": 2,
"infer_ori_strategy": "hotspots", "infer_ori_strategy": "hotspots",
"input": "input_pdbs/4zxb_cropped.pdb", "input": "../input_pdbs/4zxb_cropped.pdb",
"contig": "40-120,/0,E6-155", "contig": "40-120,/0,E6-155",
"select_hotspots": { "select_hotspots": {
"E64": "CD2,CZ", "E64": "CD2,CZ",
@@ -52,7 +52,7 @@ The input files for the different examples are provided in `foundry/models/rfd3/
"pdl1": { "pdl1": {
"dialect": 2, "dialect": 2,
"infer_ori_strategy": "hotspots", "infer_ori_strategy": "hotspots",
"input": "input_pdbs/5o45_cropped.pdb", "input": "../input_pdbs/5o45_cropped.pdb",
"contig": "50-120,/0,A17-131", "contig": "50-120,/0,A17-131",
"select_hotspots": { "select_hotspots": {
"A56": "CG,OH", "A56": "CG,OH",

View File

@@ -1,13 +1,13 @@
#!/bin/bash #!/bin/bash
foundry=../../../ foundry=../../../../
export PYTHONPATH="$foundry/src:$foundry/models/rfd3/src/" export PYTHONPATH="$foundry/src:$foundry/models/rfd3/src/"
outdir=./na_tutorial_outputs/ outdir=./na_tutorial_outputs/
rm $outdir/* rm $outdir/*
ckpt_path=rfd3_foundry_2025_12_01.ckpt ckpt_path=rfd3_latest.ckpt
uv run python $foundry/models/rfd3/src/rfd3/run_inference.py ckpt_path=$ckpt_path out_dir=$outdir inputs=./na_binder_design.json n_batches=2 diffusion_batch_size=3 cleanup_virtual_atoms=True uv run python $foundry/models/rfd3/src/rfd3/run_inference.py ckpt_path=$ckpt_path out_dir=$outdir inputs=./na_binder_design.json n_batches=2 diffusion_batch_size=3 cleanup_virtual_atoms=True
#some cleanup #some cleanup

View File

@@ -1,6 +1,6 @@
{ {
"buried": { "buried": {
"input": "./input_pdbs/IAI.pdb", "input": "../input_pdbs/IAI.pdb",
"length": "180-180", "length": "180-180",
"ligand": "IAI", "ligand": "IAI",
"select_fixed_atoms": { "select_fixed_atoms": {
@@ -11,7 +11,7 @@
} }
}, },
"partial": { "partial": {
"input": "./input_pdbs/IAI.pdb", "input": "../input_pdbs/IAI.pdb",
"ligand": "IAI", "ligand": "IAI",
"length": "180-180", "length": "180-180",
"select_fixed_atoms": { "select_fixed_atoms": {

View File

@@ -31,7 +31,7 @@ RFD3 is also capable of designing proteins that bind small molecules. Here are s
```json ```json
{ {
"buried": { "buried": {
"input": "./input_pdbs/IAI.pdb", "input": "../input_pdbs/IAI.pdb",
"length": "180-180", "length": "180-180",
"ligand": "IAI", "ligand": "IAI",
"select_fixed_atoms": { "select_fixed_atoms": {
@@ -42,7 +42,7 @@ RFD3 is also capable of designing proteins that bind small molecules. Here are s
} }
}, },
"partial": { "partial": {
"input": "./input_pdbs/IAI.pdb", "input": "../input_pdbs/IAI.pdb",
"ligand": "IAI", "ligand": "IAI",
"length": "180-180", "length": "180-180",
"select_fixed_atoms": { "select_fixed_atoms": {

View File

@@ -18,7 +18,7 @@
"id": "C2", "id": "C2",
"is_symmetric_motif": true "is_symmetric_motif": true
}, },
"input": "input_pdbs/symmetry_examples/1j79_C2.pdb", "input": "../input_pdbs/symmetry_examples/1j79_C2.pdb",
"ligand": "ORO,ZN", "ligand": "ORO,ZN",
"unindex": "A250", "unindex": "A250",
"length": 130, "length": 130,
@@ -31,7 +31,7 @@
"id": "C2", "id": "C2",
"is_symmetric_motif": true "is_symmetric_motif": true
}, },
"input": "input_pdbs/symmetry_examples/1e3v_C2.pdb", "input": "../input_pdbs/symmetry_examples/1e3v_C2.pdb",
"ligand": "DXC", "ligand": "DXC",
"unindex": "A16,A40,A100,A103", "unindex": "A16,A40,A100,A103",
"length": 80, "length": 80,
@@ -48,7 +48,7 @@
"is_symmetric_motif": true, "is_symmetric_motif": true,
"is_unsym_motif": "HEM" "is_unsym_motif": "HEM"
}, },
"input": "input_pdbs/symmetry_examples/1bfr_C2.pdb", "input": "../input_pdbs/symmetry_examples/1bfr_C2.pdb",
"ligand": "HEM", "ligand": "HEM",
"contig": "51,M52,80", "contig": "51,M52,80",
"length": null, "length": null,
@@ -62,7 +62,7 @@
"is_symmetric_motif": true, "is_symmetric_motif": true,
"is_unsym_motif": "Y1-11,Z16-25" "is_unsym_motif": "Y1-11,Z16-25"
}, },
"input": "input_pdbs/symmetry_examples/6t8h_C3.pdb", "input": "../input_pdbs/symmetry_examples/6t8h_C3.pdb",
"contig": "100-100,/0,Y1-11,/0,Z16-25", "contig": "100-100,/0,Y1-11,/0,Z16-25",
"length": null, "length": null,
"is_non_loopy": true "is_non_loopy": true

View File

@@ -86,7 +86,7 @@ The tasks that these examples describe are as follows:
"id": "C2", "id": "C2",
"is_symmetric_motif": true "is_symmetric_motif": true
}, },
"input": "input_pdbs/symmetry_examples/1j79_C2.pdb", "input": "../input_pdbs/symmetry_examples/1j79_C2.pdb",
"ligand": "ORO,ZN", "ligand": "ORO,ZN",
"unindex": "A250", "unindex": "A250",
"length": 130, "length": 130,
@@ -99,7 +99,7 @@ The tasks that these examples describe are as follows:
"id": "C2", "id": "C2",
"is_symmetric_motif": true "is_symmetric_motif": true
}, },
"input": "input_pdbs/symmetry_examples/1e3v_C2.pdb", "input": "../input_pdbs/symmetry_examples/1e3v_C2.pdb",
"ligand": "DXC", "ligand": "DXC",
"unindex": "A16,A40,A100,A103", "unindex": "A16,A40,A100,A103",
"length": 80, "length": 80,
@@ -116,7 +116,7 @@ The tasks that these examples describe are as follows:
"is_symmetric_motif": true, "is_symmetric_motif": true,
"is_unsym_motif": "HEM" "is_unsym_motif": "HEM"
}, },
"input": "input_pdbs/symmetry_examples/1bfr_C2.pdb", "input": "../input_pdbs/symmetry_examples/1bfr_C2.pdb",
"ligand": "HEM", "ligand": "HEM",
"contig": "51,M52,80", "contig": "51,M52,80",
"length": null, "length": null,
@@ -130,7 +130,7 @@ The tasks that these examples describe are as follows:
"is_symmetric_motif": true, "is_symmetric_motif": true,
"is_unsym_motif": "Y1-11,Z16-25" "is_unsym_motif": "Y1-11,Z16-25"
}, },
"input": "input_pdbs/symmetry_examples/6t8h_C3.pdb", "input": "../input_pdbs/symmetry_examples/6t8h_C3.pdb",
"contig": "150-150,/0,Y1-11,/0,Z16-25", "contig": "150-150,/0,Y1-11,/0,Z16-25",
"length": null, "length": null,
"is_non_loopy": true "is_non_loopy": true

View File

@@ -13,21 +13,23 @@ General
intro_inference_calculations.md intro_inference_calculations.md
input.md input.md
designability_vs_diversity.md
Tutorials Tutorials
--------- ---------
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
ppi_design_tutorial.md tutorials/ppi_design_tutorial.md
tutorials/enzyme_design_tutorial.md
Examples Examples
-------- --------
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
na_binder_design.md examples/na_binder_design.md
sm_binder_design.md examples/sm_binder_design.md
protein_binder_design.md examples/protein_binder_design.md
symmetry.md examples/symmetry.md
enzyme_design.md examples/enzyme_design.md

View File

@@ -130,10 +130,10 @@ Below is a table of all of the inputs that the `InputSpecification` accepts. Use
| `select_fixed_atoms` | `InputSelection` | Atoms with fixed coordinates. See the [Select Fixed Atoms](#select-fixed-atoms) subsection for more information. | | `select_fixed_atoms` | `InputSelection` | Atoms with fixed coordinates. See the [Select Fixed Atoms](#select-fixed-atoms) subsection for more information. |
| `select_unfixed_sequence` | `InputSelection` | Where sequence can change. Default is `True` - all input regions have fixed sequences. Contig string input specifies components to unfix the sequence for. Dictionary inputs are allowed but not recommended.| | `select_unfixed_sequence` | `InputSelection` | Where sequence can change. Default is `True` - all input regions have fixed sequences. Contig string input specifies components to unfix the sequence for. Dictionary inputs are allowed but not recommended.|
| `select_buried` / `select_partially_buried` / `select_exposed` | `InputSelection` | Selection of RASA (Relatively Accessible Surface Area) for buried, partially buried, and exposed conditioning, respectively. Only contig string and dictionary are acceptable inputs. | | `select_buried` / `select_partially_buried` / `select_exposed` | `InputSelection` | Selection of RASA (Relatively Accessible Surface Area) for buried, partially buried, and exposed conditioning, respectively. Only contig string and dictionary are acceptable inputs. |
| `select_hbond_donor` / `select_hbond_acceptor` | `InputSelection` | Atom-wise donor/acceptor flags. Atom-wise selection of hydrogen bond donors and acceptors, respectively. Only dictionary inputs allowed. See {doc}`na_binder_design` for an example. | | `select_hbond_donor` / `select_hbond_acceptor` | `InputSelection` | Atom-wise donor/acceptor flags. Atom-wise selection of hydrogen bond donors and acceptors, respectively. Only dictionary inputs allowed. See {doc}`examples/na_binder_design` for an example. |
| `select_hotspots` | `InputSelection` | Atom-level or residue-level hotspots. Hotspots will typically be at most 4.5 Å to any heavy atom in the designed structure. Typically used for designing binders. | | `select_hotspots` | `InputSelection` | Atom-level or residue-level hotspots. Hotspots will typically be at most 4.5 Å to any heavy atom in the designed structure. Typically used for designing binders. |
| `redesign_motif_sidechains` | `bool` | Fixed backbone, redesigned sidechains for motifs (input structures). | | `redesign_motif_sidechains` | `bool` | Fixed backbone, redesigned sidechains for motifs (input structures). |
| `symmetry` | `SymmetryConfig` | See {doc}`symmetry`. | | `symmetry` | `SymmetryConfig` | See {doc}`examples/symmetry`. |
| `ori_token` | `list[float]` | `[x,y,z]` origin override to control COM (center of mass) placement of designed structure. | | `ori_token` | `list[float]` | `[x,y,z]` origin override to control COM (center of mass) placement of designed structure. |
| `infer_ori_strategy` | `str` | `"com"` or `"hotspots"`. The center of mass of the diffused region will typically be within 5Å of the ORI token. Using `hotspots` will place the ORI token 10Å outward from the center of mass of the specified hotspots. Using `com` will place the token at the center of mass of the input structure.| | `infer_ori_strategy` | `str` | `"com"` or `"hotspots"`. The center of mass of the diffused region will typically be within 5Å of the ORI token. Using `hotspots` will place the ORI token 10Å outward from the center of mass of the specified hotspots. Using `com` will place the token at the center of mass of the input structure.|
| `plddt_enhanced` | `bool` | Default `True`. Enables pLDDT (predicted Local Distance Difference Test) enhancement. | | `plddt_enhanced` | `bool` | Default `True`. Enables pLDDT (predicted Local Distance Difference Test) enhancement. |
@@ -239,7 +239,7 @@ In the following example, RFD3 will noise out by 15 angstroms and constrain atom
```json ```json
{ {
"partial_diffusion": { "partial_diffusion": {
"input": "paper_examples/7v11.cif", "input": "input_pdbs/7v11.cif",
"ligand": "OQO", "ligand": "OQO",
"partial_t": 15.0, "partial_t": 15.0,
"unindex": "A431,A572-573", "unindex": "A431,A572-573",

View File

@@ -2,7 +2,7 @@
In RFdiffusion3 (RFD3), [YAML](https://yaml.org/) or [JSON](https://www.json.org/json-en.html) files are used to specify the **settings** for your inference calculations and [**configuration options**](https://hydra.cc/docs/configure_hydra/intro/) are used to provide other information about your calculation, such as the location and name of the checkpoint file you want to use. In RFdiffusion3 (RFD3), [YAML](https://yaml.org/) or [JSON](https://www.json.org/json-en.html) files are used to specify the **settings** for your inference calculations and [**configuration options**](https://hydra.cc/docs/configure_hydra/intro/) are used to provide other information about your calculation, such as the location and name of the checkpoint file you want to use.
## Inference Settings ## Inference Settings
The inference 'settings' are how you constrain your inference calculation, such as specifying portions of the output you wish to have designed (`contig`) and specifying any symmetries that exist in your system (`symmetry`). These settings are stored in either a YAML or JSON file to be interpreted by RFdiffusion3. Runnable example sof json and yaml files can be found in `foundry/models/rfd3/docs`. The inference 'settings' are how you constrain your inference calculation, such as specifying portions of the output you wish to have designed (`contig`) and specifying any symmetries that exist in your system (`symmetry`). These settings are stored in either a YAML or JSON file to be interpreted by RFdiffusion3. Runnable examples of json and yaml files can be found in `foundry/models/rfd3/docs/examples`.
Using this type of input specification allows you to define different types of inference calculations all in the same file, and either run all of the calculation types defined in the file or specify the specific calculation you want to run via the command line. Using this type of input specification allows you to define different types of inference calculations all in the same file, and either run all of the calculation types defined in the file or specify the specific calculation you want to run via the command line.

View File

@@ -0,0 +1,177 @@
# Enzyme Design in RFdiffusion3
## Before We Get Started...
This tutorial does not cover installing RFD3. If you need to install this model, see the [README](https://github.com/RosettaCommons/foundry/tree/production/models/rfd3) for installation instructions. You will need to remember the path to the directory where you stored your checkpoint files, if you did not store them in the default location.
```{note}
You will need to clone the repository to access the tutorial files. Using the `pip` commands to install the model does not automatically download the files in the repository to your system.
```
Make sure you have activated any environment(s) you used to install RFD3.
RFD3 runs best on GPUs. It is suggested to follow this tutorial on an interactive GPU node if you have access to one.
You will need the file `1euv_lig.pdb`. This is provided in [`foundry/models/rfd3/docs/tutorials/enzyme_tutorial_files/`](enzyme_tutorial_files/1euv_lig.pdb). You can clone the [`foundry`](https://github.com/RosettaCommons/foundry) repository to easily access files related to this tutorial.
<!-- Lastly, we will be visualizing the outputs of the calculations presented in the tutorial using [PyMOL](https://pymol.org/). The visualization steps are completely optional, but if you would like to follow along you will need to have PyMOL installed.-->
(enzyme-learning-objectives)=
## Learning Objectives
In this tutorial, we will use RFdiffusion3 to design cysteine hydrolases, similar to what is described in [*De novo* design of All-atom Biomolecular Interactions with RFdiffusion3](https://www.biorxiv.org/content/10.1101/2025.09.18.676967v2). This will allow us to explore the constraint options useful in enzyme design tasks.
(enzyme-setup)=
## Setup
Create a directory named `rfd3_enzyme_tutorial` and `cd` into it:
```bash
mkdir rfd3_enzyme_tutorial && cd rfd3_enzyme_tutorial
```
This is where you will be storing the files related to this tutorial.
If you would like to compare your outputs against those generated by the authors of this tutorial, you can find pre-generated output files in `foundry/models/rfd3/docs/tutorials/enzyme_tutorial_files/outputs.zip`.
There is also a pre-made JSON file available in `foundry/models/rfd3/docs/enzyme_tutorial_files`. We recommend following the tutorial to create this file yourself to better understand the RFD3 options that are relevant to enzyme design.
(enzyme-creating-the-json-file)=
## Creating the JSON file
In the next few sections we will be briefly describing the settings we will be using for this example enzyme design project. If you would like more information about the options discussed here or information about the other options that are available, see the [input specification](input.md) documentation.
1. Using your editor of choice, open a new file called `rfd3_enzyme_tutorial.json`. This is where we will be storing the options we will use to constrain our enzyme design.
1. This is a JSON file, so all of the options contained in it need to be encapsulated in curly braces ({}). Go ahead and add a pair of these to your file.
1. Like all designs you will create using RFD3, we need to start by giving our calculation a name. It should be short, but descriptive, so let's call it `cys_1euv_lig`. Add this name in quotes to your file and place a colon and another pair of curly brackets after this. Your file should now look like:
```json
{
"cys_1euv_lig":{
}
}
```
All of the other settings discussed here will go inside the inner curly brackets.
1. Next we need to specify the structure file (PDB, CIF, etc.) that contains information about any input structures related to our calculation:
```json
"input": "path/to/1euv_lig.pdb",
```
1. The identifier representing the ligand in our PDB file needs to be listed so that RFD3 knows to treat this molecule differently:
```json
"ligand": "l:g",
```
```{note}
The ligand used in this tutorial is not a real molecule. Placing a colon (:) in your ligand name ensures that it does not match a molecule in the [Chemical Component Database](https://www.wwpdb.org/data/ccd). If you are running a calculation that uses a real ligand, feel free to use its actual chemical identifier.
```
1. Add an option to `unindex` the residues in the input file. These residues were determined to be important for the enzymatic activity we are trying to create and design a protein around. However, we don't know where in our designed structure we want these enzymes to be, making this option incredibly useful for enzyme design:
```json
"unindex": "A514,A531,A574,A579-581",
```
```{important}
Choosing the residues to use in your enzyme design comes from knowledge of your system, literature searches, etc. The only guidance we will give on this topic is to try several combinations of the residues you think are important for your enzyme design. Too many and you might overconstrain your system, too few and you are less likely to obtain useful designs.
```
1. We will use the `length` option to tell RFD3 how long we want our designed proteins to be:
```json
"length": "100-200",
```
1. To define where our protein should be centered, we will give RFD3 an 'ori token'. This specifies the *ori*gin (center of mass) of the designed portion of our output structure:
```json
"ori_token": [0,1,0],
```
```{figure} ../.assets/enzyme_tutorial/enzyme_ori_token.png
:width: 60%
Image of the input structure with the ORI token in the center, visualized as a white sphere.
```
```{important}
In this example the ori token is placed close to the center of our input structure. When designing your own enzyme scaffolds, this may not be the best placement depending on your design goals. See the [RFdiffusion2 paper](https://www.nature.com/articles/s41592-025-02975-x) for more information about how ORI tokens impact the results of diffusion calculations.
```
1. Even though we do not care where our residues end up in our final protein sequence, we want their geometries (or at least some of their atoms) to remain in the same place spatially so that their relationships to the ligand stay the same. For this we use `select_fixed_atoms`:
```json
"select_fixed_atoms": {
"A514":"NE2,CE1,ND1,CD2,CG,CB",
"A531":"OD1,CG,OD2,CB",
"A574":"NE2,CD,OE1,CG",
"A579":"C,O,CA,N",
"A580":"SG,CB,CA,N,C,O",
"A581":"C,O,CA,N"
},
```
For residue A514 (histidine), A531 (aspartic acid), and A574 (glutamine) the side chain is fixed, for residue 580 (cysteine) the entire residue is fixed, and for A579 (aspartic acid) and A581 (glycine) the backbones are fixed. The ligand is automatically held in place.
```{figure} ../.assets/enzyme_tutorial/select_fixed_atoms.png
:width: 60%
Image of the starting structure where the fixed atoms have been highlighted in purple.
```
1. RFD3 allows for RASA conditioning to control how exposed or buried different portions of your input are relative to the designed protein.
```json
"select_buried": {
"l:g": "O1,C8,O3,C4,C5,C23,C24,C25,C26,C27"
},
"select_exposed": {
"l:g": "C2,C22,C19,C18,C17,C20,C16,C15,O21,O14,C13,C12"
},
```
As the names of the options suggest, RFD3 will do its best to bury atoms that were passed to the `select_buried` option and expose the atoms passed to the `select_exposed` option.
```{figure} ../.assets/enzyme_tutorial/RASA_ligand.png
:width: 60%
Image of the ligand where the exposed portion is colored blue and the buried portion is colored red.
```
There is a third option for RASA conditioning which was not used here,`select_partially_buried`, that you might find useful for your protein design tasks.
1. Next we're also going to unfix the *sequence* for residues A579 and A581. For this design, we know where we want the backbones for the residues next to A580 (the cysteine), but their exact identity and index does not matter. This is where the option `select_unfixed_sequence` becomes useful.
```json
"select_unfixed_sequence": "A579,A581",
```
```{note}
These residues have fixed portions in the `select_fixed_atoms` setting, but the atoms chosen are **only the backbone atoms**. If you are using both of these settings in your designs it is important to not fix any of the side chain atoms to allow for the identity of the residue to actually change.
```
1. Save your file and close it. Your files should be similar to what is stored in `outputs.zip`.
(enzyme-running-rfd3)=
## Running RFD3
To actually run RFD3 you need to know:
- the directory you want the outputs to be stored in
- the path to the JSON (or YAML) file that stores the specific settings for the calculation
- the location of your checkpoint files
Once you have these three things you can run something like this from the command line:
```bash
rfd3 design out_dir=enzyme_tutorial_outputs/0 inputs=rfd3_enzyme_tutorial.json ckpt_path=/path/to/your/checkpoint/files/rfd3_latest.ckpt
```
Your output files will be placed in a new directory `enzyme_tutorial_outputs/0`. Your output files will be named `enzyme_tutorial_cys_1euv_lig_0_model_n.cif.gz` where `n` is the number of the design. `enzyme_tutorial` comes from the name of the JSON file and `cys_1euv_lig` comes from the name you gave your calculation in the JSON file.
```{note}
You may see several warning messages when you run RFD3, these should not interfere with the calculation.
```
(enzyme-analyzing-the-outputs)=
## Analyzing the Outputs
You should end up with 8 designs, numbered 0-7, each with its own `.cif.gz` and `.json` file. If you want to adjust the number, add the configuration option `diffusion_batch_size` to your `rfd3 design` command.
The JSON file has many details about your diffusion run, including the options in the YAML file you created. The compressed CIF file contains information about the final diffused structure that you can easily visualize with tools like PyMOL.
Your results should look something like this:
```{figure} ../.assets/enzyme_tutorial/example_output_fixed_ori.png
:width: 60%
Example output for the enzyme scaffold design. The input structure is in green and the output structure is light pink. The residues that had some of their atoms fixed are highlighted in purple. The ORI token is shown as a white sphere.
```
You'll notice:
- The fixed atoms have stayed in place, once the structures have been aligned to the ligand
- The index of the original residues has changed (easy to tell in this example because we have held their coordinates fixed, you can see the mapping in the output JSON file.)
- The structures are all the same length, which is between 100 and 200 residues long. (All designs in the same batch will have the same length. Set the batch size to 1 if you want to design proteins with all different lengths.)
Changing the view slightly lets us see that our RASA conditioning was also followed:
```{figure} ../.assets/enzyme_tutorial/RASA_output.png
:width: 60%
You can see that the portion of the ligand that was specified as exposed (blue) is much less buried in the designed protein than the portion of the ligand that was specified as buried (red).
```
## Conclusion
You have now set up an RFD3 calculation and successfully ran the inference code for an enzyme design problem. While the options discussed here are particularly useful in enzyme design projects, RFD3 has many more that you can explore by looking at (input.md).
(enzyme-references)=
## References and Further Reading
- For more information on the different inference settings in RFD3, see [input.md](../input.md)
- The calculation presented here was used to benchmark RFdiffusion2, for more information see [Atom-level enzyme active site scaffolding using RFdiffusion2](https://www.nature.com/articles/s41592-025-02975-x)
- A more thorough discussion of the settings and configuration options in RFD3 can be found [here](../intro_inference_calculations.md)

View File

@@ -0,0 +1,92 @@
ATOM 2 N HIS A 514 3.231 1.208 -1.240 1.00 19.42 A N
ATOM 3 CA HIS A 514 1.911 1.858 -1.146 1.00 18.23 A C
ATOM 4 C HIS A 514 1.410 2.170 -2.547 1.00 16.65 A C
ATOM 5 O HIS A 514 2.223 2.219 -3.470 1.00 18.87 A O
ATOM 6 CB HIS A 514 2.084 3.200 -0.401 1.00 18.92 A C
ATOM 7 CG HIS A 514 0.793 3.885 -0.097 1.00 19.11 A C
ATOM 8 CD2 HIS A 514 0.480 5.175 -0.343 1.00 19.33 A C
ATOM 9 ND1 HIS A 514 -0.281 3.307 0.570 1.00 19.64 A N
ATOM 10 CE1 HIS A 514 -1.232 4.247 0.679 1.00 20.67 A C
ATOM 11 NE2 HIS A 514 -0.784 5.363 0.136 1.00 18.73 A N
ATOM 12 N ASP A 531 -4.991 9.875 -2.283 1.00 18.01 A N
ATOM 13 CA ASP A 531 -4.198 8.868 -1.542 1.00 17.65 A C
ATOM 14 C ASP A 531 -5.028 8.515 -0.313 1.00 17.93 A C
ATOM 15 O ASP A 531 -5.344 9.384 0.513 1.00 19.21 A O
ATOM 16 CB ASP A 531 -2.843 9.512 -1.268 1.00 18.96 A C
ATOM 17 CG ASP A 531 -1.823 8.638 -0.562 1.00 19.66 A C
ATOM 18 OD1 ASP A 531 -2.229 7.763 0.218 1.00 20.25 A O
ATOM 19 OD2 ASP A 531 -0.605 8.900 -0.780 1.00 19.95 A O1-
ATOM 20 N GLN A 574 -7.907 4.613 4.461 1.00 20.37 A N
ATOM 21 CA GLN A 574 -7.953 3.179 4.855 1.00 21.00 A C
ATOM 22 C GLN A 574 -7.671 3.054 6.344 1.00 21.03 A C
ATOM 23 O GLN A 574 -6.759 3.718 6.872 1.00 20.78 A O
ATOM 24 CB GLN A 574 -6.902 2.474 3.981 1.00 21.23 A C
ATOM 25 CG GLN A 574 -5.460 2.863 4.335 1.00 22.92 A C
ATOM 26 CD GLN A 574 -4.523 2.355 3.261 1.00 23.64 A C
ATOM 27 NE2 GLN A 574 -3.727 1.344 3.595 1.00 24.35 A N
ATOM 28 OE1 GLN A 574 -4.554 2.817 2.125 1.00 23.51 A O
ATOM 29 N ASP A 579 -4.972 -2.251 2.419 1.00 16.24 A N
ATOM 30 CA ASP A 579 -6.009 -1.965 1.401 1.00 15.40 A C
ATOM 31 C ASP A 579 -5.673 -0.876 0.414 1.00 15.59 A C
ATOM 32 O ASP A 579 -6.577 -0.537 -0.384 1.00 16.28 A O
ATOM 33 CB ASP A 579 -7.339 -1.639 2.096 1.00 16.76 A C
ATOM 34 CG ASP A 579 -8.180 -2.869 2.367 1.00 18.53 A C
ATOM 35 OD1 ASP A 579 -8.022 -3.926 1.704 1.00 18.83 A O
ATOM 36 OD2 ASP A 579 -9.038 -2.793 3.279 1.00 20.44 A O1-
ATOM 37 N CYS A 580 -4.482 -0.303 0.456 1.00 14.58 A N
ATOM 38 CA CYS A 580 -4.229 0.856 -0.451 1.00 14.75 A C
ATOM 39 C CYS A 580 -4.592 0.525 -1.890 1.00 15.30 A C
ATOM 40 O CYS A 580 -5.133 1.439 -2.570 1.00 15.92 A O
ATOM 41 CB CYS A 580 -2.811 1.409 -0.381 1.00 16.92 A C
ATOM 42 SG CYS A 580 -1.587 0.078 -0.708 1.00 15.01 A S
ATOM 43 N GLY A 581 -4.214 -0.616 -2.427 1.00 15.26 A N
ATOM 44 CA GLY A 581 -4.496 -0.901 -3.873 1.00 15.15 A C
ATOM 45 C GLY A 581 -6.031 -0.979 -4.060 1.00 12.80 A C
ATOM 46 O GLY A 581 -6.496 -0.558 -5.161 1.00 16.14 A O
TER
HETATM 47 C2 l:g B 1 0.721 0.361 2.697 1.00 0.00 B C
HETATM 48 O3 l:g B 1 -2.014 -0.494 1.873 1.00 0.00 B O
HETATM 49 C4 l:g B 1 -0.420 -1.974 0.850 1.00 0.00 B C
HETATM 50 C5 l:g B 1 0.633 -2.135 -0.222 1.00 0.00 B C
HETATM 51 C8 l:g B 1 -0.950 -0.521 0.955 1.00 0.00 B C
HETATM 52 C12 l:g B 1 1.587 -0.654 3.148 1.00 0.00 B C
HETATM 53 C13 l:g B 1 2.184 -0.571 4.411 1.00 0.00 B C
HETATM 54 O14 l:g B 1 2.997 -1.544 4.811 1.00 0.00 B O
HETATM 55 C15 l:g B 1 3.599 -1.534 5.994 1.00 0.00 B C
HETATM 56 C16 l:g B 1 3.396 -0.477 6.884 1.00 0.00 B C
HETATM 57 C17 l:g B 1 2.549 0.578 6.518 1.00 0.00 B C
HETATM 58 C18 l:g B 1 1.924 0.537 5.251 1.00 0.00 B C
HETATM 59 C19 l:g B 1 1.061 1.550 4.794 1.00 0.00 B C
HETATM 60 C20 l:g B 1 2.333 1.717 7.480 1.00 0.00 B C
HETATM 61 O21 l:g B 1 4.358 -2.487 6.314 1.00 0.00 B O
HETATM 62 C22 l:g B 1 0.472 1.457 3.531 1.00 0.00 B C
HETATM 63 C23 l:g B 1 0.259 -2.375 -1.555 1.00 0.00 B C
HETATM 64 C24 l:g B 1 1.235 -2.502 -2.549 1.00 0.00 B C
HETATM 65 C25 l:g B 1 2.589 -2.391 -2.220 1.00 0.00 B C
HETATM 66 C26 l:g B 1 2.969 -2.152 -0.897 1.00 0.00 B C
HETATM 67 C27 l:g B 1 1.997 -2.024 0.100 1.00 0.00 B C
HETATM 68 O1 l:g B 1 0.085 0.352 1.440 1.00 0.00 B O
CONECT 42 51
CONECT 47 52 62 68
CONECT 48 51
CONECT 49 50 51
CONECT 50 49 63 67
CONECT 51 42 48 49 68
CONECT 52 47 53
CONECT 53 52 54 58
CONECT 54 53 55
CONECT 55 54 56 61
CONECT 56 55 57
CONECT 57 56 58 60
CONECT 58 53 57 59
CONECT 59 58 62
CONECT 60 57
CONECT 61 55
CONECT 62 47 59
CONECT 63 50 64
CONECT 64 63 65
CONECT 65 64 66
CONECT 66 65 67
CONECT 67 50 66
CONECT 68 47 51
END

View File

@@ -0,0 +1,24 @@
{
"1euv": {
"input": "./1euv_lig.pdb",
"ligand": "l:g",
"unindex": "A514,A531,A574,A579-581",
"length": "100-200",
"ori_token": [0, 1, 0],
"select_fixed_atoms": {
"A514":"NE2,CE1,ND1,CD2,CG,CB",
"A531":"OD1,CG,OD2,CB",
"A574":"NE2,CD,OE1,CG",
"A579":"C,O,CA,N",
"A580":"SG,CB,CA,N,C,O",
"A581":"C,O,CA,N"
},
"select_buried": {
"l:g": "O1,C8,O3,C4,C5,C23,C24,C25,C26,C27"
},
"select_exposed": {
"l:g": "C2,C22,C19,C18,C17,C20,C16,C15,O21,O14,C13,C12"
},
"select_unfixed_sequence": "A579,A581"
}
}

View File

@@ -1,7 +1,7 @@
# Protein-Protein Interface Design in RFdiffusion3 # Protein-Protein Interface Design in RFdiffusion3
## Before We Get Started... ## Before We Get Started...
This tutorial does not cover installing RFD3, before continuing you should make sure that RFdiffusion3 (RFD3) is installed and able to be run on your system. This tutorial does not cover installing RFD3. Before continuing, you should make sure that RFdiffusion3 (RFD3) is installed and runnable on your system.
See the [README](https://github.com/RosettaCommons/foundry/tree/production/models/rfd3) for installation instructions. You will need to remember the path to the directory where you stored your checkpoint files. See the [README](https://github.com/RosettaCommons/foundry/tree/production/models/rfd3) for installation instructions. You will need to remember the path to the directory where you stored your checkpoint files.
@@ -10,19 +10,19 @@ The instructions below assume that you have installed RFD3 via the pip commands.
You may need to slightly modify how you run the calculations based on your setup. You may need to slightly modify how you run the calculations based on your setup.
``` ```
Make sure you have activated any environments you used to install RFD3. Make sure you have activated any environment(s) you used to install RFD3.
RFD3 runs best on GPUs. It is suggested to follow this tutorial on an interactive GPU node, if you have access to one. RFD3 runs best on GPUs. It is suggested to follow this tutorial on an interactive GPU node if you have access to one.
You will need the file `4zxb_cropped.pdb`. This is provided in [`foundry/models/rfd3/docs/input_pdbs`](input_pdbs/4zxb_cropped.pdb). You can clone the [`foundry`](https://github.com/RosettaCommons/foundry) repository to easily access files related to this tutorial. You will need the file `4zxb_cropped.pdb`. This is provided in [`foundry/models/rfd3/docs/input_pdbs`](../input_pdbs/4zxb_cropped.pdb). You can clone the [`foundry`](https://github.com/RosettaCommons/foundry) repository to easily access files related to this tutorial.
Lastly, we will be visualizing the outputs of the calculations presented in the tutorial using [PyMOL](https://pymol.org/). The visualization steps are completely optional, but if you would like to follow along you will need to have PyMOL installed. Lastly, we will be visualizing the outputs of the calculations presented in the tutorial using [PyMOL](https://pymol.org/). The visualization steps are completely optional, but if you would like to follow along you will need to have PyMOL installed.
(learning-objectives)= (ppi-learning-objectives)=
## Learning Objectives ## Learning Objectives
In this tutorial, we will design a binder for the human insulin receptor to explore the settings available in RFD3 that are useful in protein-protein interface (PPI) design. In this tutorial, we will design a binder for the human insulin receptor to explore the settings available in RFD3 that are useful in protein-protein interface (PPI) design.
(setup)= (ppi-setup)=
## Setup ## Setup
Create a directory named `rfd3_ppi_tutorial` and `cd` into it: Create a directory named `rfd3_ppi_tutorial` and `cd` into it:
```bash ```bash
@@ -30,16 +30,16 @@ mkdir rfd3_ppi_tutorial && cd rfd3_ppi_tutorial
``` ```
This is where you will be storing the files related to this tutorial. This is where you will be storing the files related to this tutorial.
If you would like to compare your outputs against those generated by the authors of this tutorial, you can find pre-generated output files in `foundry/models/rfd3/docs/ppi_tutorial_files` If you would like to compare your outputs against those generated by the authors of this tutorial, you can find pre-generated output files in `foundry/models/rfd3/docs/tutorials/ppi_tutorial_files`.
The 'basic' zip file contains outputs that did not use the setting discussed in [Other Useful Settings](#other-useful-settings) section. The 'fixed' zip file has the outputs resulting from using the `select_fixed_atoms` option. The 'basic' zip file contains outputs that did not use the setting discussed in [Other Useful Settings](#ppi-other-useful-settings) section. The 'fixed' zip file has the outputs resulting from using the `select_fixed_atoms` option.
There is also an already made YAML file available in `foundry/models/rfd3/docs/ppi_tutorial_files`. We recommend following the tutorial to create this file yourself to better understand the RFD3 options that are relevant to PPI design. There is also a pre-made YAML file available in `foundry/models/rfd3/docs/tutorials/ppi_tutorial_files`. We recommend following the tutorial to create this file yourself to better understand the RFD3 options that are relevant to PPI design.
(creating-the-yaml-file)= (ppi-creating-the-yaml-file)=
## Creating the YAML file ## Creating the YAML file
In this tutorial, we will be briefly describing each of the settings we will be using for this example binder design project. In this tutorial, we will be briefly describing each of the settings we will be using for this example binder design project.
1. Using your editor of choice, open a new file called `rfd3_ppi_tutorial.yaml`. This is where we will be storing all of the settings that tell RFD3 the type of designs we would like to make. 1. Using your editor of choice, open a new file called `ppi_tutorial.yaml`. This is where we will be storing all of the settings that tell RFD3 the type of designs we would like to make.
1. Our calculation needs a name. For this tutorial, we will only be including one example calculation, but your YAML file could have several. A name allows you (and RFD3) to differentiate them. Since we are designing binders for the human insulin receptor, let's just call it `insulinr`: 1. Our calculation needs a name. For this tutorial, we will only be including one example calculation, but your YAML file could have several. A name allows you (and RFD3) to differentiate them. Since we are designing binders for the human insulin receptor, let's just call it `insulinr`:
```yaml ```yaml
insulinr: insulinr:
@@ -47,13 +47,13 @@ In this tutorial, we will be briefly describing each of the settings we will be
Everything that comes after this should be indented to show that it's part of this `insulinr` calculation. You will want to use spaces, not the tab character. If a tab character (`\t `) is found in the file, RFD3 will crash. Everything that comes after this should be indented to show that it's part of this `insulinr` calculation. You will want to use spaces, not the tab character. If a tab character (`\t `) is found in the file, RFD3 will crash.
1. Tell RFD3 where to find your input file: 1. Tell RFD3 where to find your input file:
```yaml ```yaml
input: /path/to/rfd3_ppi_tutorial/4zxb_cropped.pdb input: /path/to/4zxb_cropped.pdb
``` ```
This file was directly cropped from the 4zxb structure that can be found in the [RSCB PDB library](https://www.rcsb.org/). *If you visualize the cropped structure against the full one from the RSCB library, they may not appear to be exactly the same structure. However, if you align the two you will get an RMSD of 0.0.* This file was directly cropped from the 4zxb structure that can be found in the [RSCB PDB library](https://www.rcsb.org/). *If you visualize the cropped structure against the full one from the RSCB library, they may not appear to be exactly the same structure. However, if you align the two you will get an RMSD of 0.0.*
<!-- ```{figure} .assets/ppi_tutorial/cropped_vs_full.png <!-- ```{figure} ../.assets/ppi_tutorial/cropped_vs_full.png
:width: 60% :width: 60%
Cropped vs. full structure of 4XZB with the RMSD of their alignment shown. THe cropped structure is in green while the full structure is shown in purple. Cropped vs. full structure of 4XZB with the RMSD of their alignment shown. The cropped structure is in green while the full structure is shown in purple.
``` --> ``` -->
1. The `contig` string is the main way you can tell RFD3 what portions of your input structure you want defined and what portions you want preserved from your input structure. 1. The `contig` string is the main way you can tell RFD3 what portions of your input structure you want defined and what portions you want preserved from your input structure.
```yaml ```yaml
@@ -62,7 +62,7 @@ In this tutorial, we will be briefly describing each of the settings we will be
The different sections of the `contig` string are separated by commas. Here's what each section is telling RFD3: The different sections of the `contig` string are separated by commas. Here's what each section is telling RFD3:
- `40-120` specifies that we want RFD3 to design a new peptide chain that is between 40 and 120 residues long - `40-120` specifies that we want RFD3 to design a new peptide chain that is between 40 and 120 residues long
- `/0` is how a chain break is specified in RFD3. - `/0` is how a chain break is specified in RFD3.
- `E6-155` is the portion of the input structure that we are keeping in our final output. The letter corresponds to the chain label in the input PDB and the starting and ending residue are included in the final structure. If you do not include the chain label, then RFD3 would just design a peptide chain between 6 and 155 residues in length. - `E6-155` is the portion of the input structure that we are keeping in our final output. The letter corresponds to the chain label in the input PDB and the starting and ending residues are included in the final structure. If you do not include the chain label, then RFD3 would just design a peptide chain between 6 and 155 residues in length.
1. We can also specify the overall number of residues in our final structure: 1. We can also specify the overall number of residues in our final structure:
```yaml ```yaml
length: 190-270 length: 190-270
@@ -77,12 +77,15 @@ In this tutorial, we will be briefly describing each of the settings we will be
E88: CG,CZ E88: CG,CZ
E96: CD1,CZ E96: CD1,CZ
``` ```
```{figure} .assets/ppi_tutorial/hotspots.png ```{figure} ../.assets/ppi_tutorial/hotspots.png
:width: 60% :width: 60%
The hotspot residues along with the specific target atoms circled in yellow. The hotspot residues along with the specific target atoms circled in yellow.
``` ```
1. Next we need to add information about our ORI token, this token specifies where we want the center of mass of our designed protein to be. Unless you know where you want to place the ORI token for your specific design needs, it is often easiest to have RFD3 infer the ORI placement based on the chosen `hotspots`: ```{note}
The residues whose atoms were selected as hotspots were specified in our `contig` string. This is necessary so that RFD3 is 'aware' of these residues.
```
1. Next we need to add information about our ORI token. This token specifies where we want the center of mass of our designed protein to be. Unless you know where you want to place the ORI token for your specific design needs, it is often easiest to have RFD3 infer the ORI placement based on the chosen `hotspots`:
```yaml ```yaml
infer_ori_strategy: hotspots infer_ori_strategy: hotspots
``` ```
@@ -91,10 +94,12 @@ In this tutorial, we will be briefly describing each of the settings we will be
```yaml ```yaml
is_non_loopy: true is_non_loopy: true
``` ```
1. Save you file and close it. 1. Save your file and close it. If you run the file now, your outputs should be similar to what is stored in `basic.zip`.
(ppi-other-useful-settings)=
### Other Useful Settings
This section provides other settings that are useful for protein-protein binder design tasks, but you may or may not need depending on your specific project.
(other-useful-settings)=
### Other useful settings
1. There is a setting for allowing structural flexibility while keeping the sequence fixed in the input structure, for example: 1. There is a setting for allowing structural flexibility while keeping the sequence fixed in the input structure, for example:
```yaml ```yaml
select_fixed_atoms: select_fixed_atoms:
@@ -104,7 +109,9 @@ In this tutorial, we will be briefly describing each of the settings we will be
``` ```
Here, an empty list indicates that all atoms are flexible, `BKBN` keeps the backbone atoms fixed while allowing side chain atoms to move, and for the last residue, specific atoms are fixed in place while allowing the others to move. Feel free to try adding this to your YAML file and see how your outputs change. Here, an empty list indicates that all atoms are flexible, `BKBN` keeps the backbone atoms fixed while allowing side chain atoms to move, and for the last residue, specific atoms are fixed in place while allowing the others to move. Feel free to try adding this to your YAML file and see how your outputs change.
(running-rfd3)= Including this setting will give you structures similar to what is in `fixed.zip`.
(ppi-running-rfd3)=
## Running RFD3 ## Running RFD3
To actually run RFD3 you need to know: To actually run RFD3 you need to know:
- the directory you want the outputs to be stored in - the directory you want the outputs to be stored in
@@ -113,22 +120,22 @@ To actually run RFD3 you need to know:
Once you have these three things you can run something like this from the command line: Once you have these three things you can run something like this from the command line:
```bash ```bash
rfd3 design out_dir=ppi_tutorial_outputs/0 inputs=ppi_tutorial.yaml ckpt_path=/path/to/your/checkpoint/files/rfd3_latest.ckpt inference_sampler.step_scale=3 inference_sampler.gamma_0=0.2 rfd3 design out_dir=ppi_tutorial_outputs/0 inputs=ppi_tutorial.yaml ckpt_path=/path/to/your/checkpoint/files/rfd3_latest.ckpt
``` ```
Your output files will be placed in a new directory `ppi_tutorial_outputs/0`. If you run the tutorial again, change the `0` to another number to not overwrite your outputs. Your output files will be named `ppi_tutorial_insulinr_0_model_n.cif.gz` where `n` is the number of the design. `ppi_tutorial` comes from the name of the YAML file and `insulinr` comes from the name you gave your calculation in the YAML file. Your output files will be placed in a new directory `ppi_tutorial_outputs/0`. Your output files will be named `ppi_tutorial_insulinr_0_model_n.cif.gz` where `n` is the number of the design. `ppi_tutorial` comes from the name of the YAML file and `insulinr` comes from the name you gave your calculation in the YAML file.
```{note} ```{note}
You may see several warning messages when you run RFD3, these should not interfere with the calculation. You may see several warning messages when you run RFD3, these should not interfere with the calculation.
``` ```
(analyzing-the-outputs)= (ppi-analyzing-the-outputs)=
## Analyzing the Outputs ## Analyzing the Outputs
You should end up with 8 designs, numbered 0-7, each with its own `.cif.gz` and `.json` file. If you want to adjust the number, add the configuration option `diffusion_batch_size` to your `rfd3 design` command. You should end up with 8 designs, numbered 0-7, each with its own `.cif.gz` and `.json` file. If you want to adjust the number, add the configuration option `diffusion_batch_size` to your `rfd3 design` command.
The JSON file has many details about your diffusion run, including the options in the YAML file you created. The compressed CIF file that you can easily visualize with tools like PyMOL. The JSON file has many details about your diffusion run, including the options in the YAML file you created. The compressed CIF file contains information about the final diffused structure that you can easily visualize with tools like PyMOL.
Your results should look something like this: Your results should look something like this:
```{figure} .assets/ppi_tutorial/example_output_w_hotspots.png ```{figure} ../.assets/ppi_tutorial/example_output_w_hotspots.png
:width: 60% :width: 60%
Green is the original input structure while blue is the designed binder. The hotspot residues are purple and represented as ball and sticks. Green is the original input structure while blue is the designed binder. The hotspot residues are purple and represented as ball and sticks.
@@ -138,11 +145,18 @@ You'll notice that the binders are always on the side of the input structure clo
The lengths of the designed binders are all also between 40 and 120 amino acids long. However, you'll also notice that they are all the same length! The lengths of the designed binders are all also between 40 and 120 amino acids long. However, you'll also notice that they are all the same length!
This is because RFD3 runs batched inference calculations. All of the calculations in a single 'batch' will have the same randomly sampled length, while designs from other batches will have different lengths. If you want to change the number of batches, you will want to add the setting `n_batches` to your `run rfd3` command. This is because RFD3 runs batched inference calculations. All of the calculations in a single 'batch' will have the same randomly sampled length, while designs from other batches will have different lengths. If you want to change the number of batches, you will want to add the setting `n_batches` to your `run rfd3` command.
(references-and-further-reading)= ## What's Next?
For your actual projects, you would want to filter the designed structures based on metrics relevant to your design task. Then, even though RFD3 outputs come with a sequence, it is recommended to still use sequence design tools ([MPNN](https://rosettacommons.github.io/foundry/models/mpnn/index.html)) to redesign the sequence. Finally you will want to see if the sequence refolds into a similar structure as was predicted by RFD3 using tools like [RosettaFold3](https://www.biorxiv.org/content/10.1101/2025.08.14.670328v2).
If you are working on a PPI design project, {doc}`../designability_vs_diversity` shows how different settings impact the designability of RFD3-produced structures in a PPI benchmark.
Feel free to go through the [other tutorials](https://rosettacommons.github.io/foundry/models/rfd3/index.html#tutorials) and other files provided in the documentation.
(ppi-references-and-further-reading)=
## References and Further Reading ## References and Further Reading
- For more information on the different inference settings in RFD3, see [input.md](input.md) - For more information on the different inference settings in RFD3, see [input.md](../input.md)
- For more information on the example used here, see [*De novo design of protein structure and function with RFdiffusion*](https://www.nature.com/articles/s41586-023-06415-8#Sec12) by Joeseph L. Watson, et. al. - For more information on the example used here, see [*De novo design of protein structure and function with RFdiffusion*](https://www.nature.com/articles/s41586-023-06415-8#Sec12) by Joseph L. Watson, et al.
- A more thorough discussion of the settings and configuration options in RFD3 can be found [here](intro_inference_calculations.md) - A more thorough discussion of the settings and configuration options in RFD3 can be found [here](../intro_inference_calculations.md)

View File

@@ -1,5 +1,5 @@
insulinr: insulinr:
input: ../input_pdbs/4zxb_cropped.pdb input: ../../input_pdbs/4zxb_cropped.pdb
contig: 40-120,/0,E6-155 contig: 40-120,/0,E6-155
length: 190-270 length: 190-270
select_hotspots: select_hotspots: