Creating the external foundry documentation structure and first round of documentation for RFD3 (#143)
* Starting to put together the foundry and RFD3 external documentation Set up the foundation for Sphinx to be able to build the external docs, first draft of a ppi_design_tutorial has been completed. * Add RFdiffusion3 documentation and update toctree Added initial documentation for RFdiffusion3 under models/rfd3/docs/index.rst and linked it in the main docs toctree. Updated toctree maxdepth for better navigation and added a symlink for the rfd3 model documentation. * Update and expand RFdiffusion3 documentation Added introductory and reference documentation files, improved the index structure with general information and examples, and enhanced the PPI design tutorial with additional notes, figures, and clarifications. Also fixed a typo in a PDB filename. * Update RFD3 documentation and tutorial content Expanded the Sphinx static path to include RFD3 assets and made minor formatting and clarity improvements in the main and RFD3-specific documentation. The PPI design tutorial was revised for clarity, improved step-by-step instructions, and better separation of setup and execution steps. * Update RFD3 documentation and tutorials Added source_suffix to Sphinx conf.py for Markdown support. Updated index.rst to include new documentation sections. Expanded intro_inference_calculations.md with detailed instructions on inference input formats, job configuration, and output files. Improved input.md formatting for appendices and FAQs. Revised ppi_design_tutorial.md for clarity, added details on settings, and expanded explanations for hotspots and batch inference. * Update RFD3 docs: clarify input specs and file formats Expanded and clarified the documentation for RFdiffusion3 input specifications, including more detailed explanations of the 'contig' string, input file types, and example YAML/JSON formats. Improved the intro to inference calculations to better explain the structure and usage of settings files, and updated descriptions for job configuration and output files. Added a placeholder for configuration options documentation. * Update RFdiffusion3 input documentation and examples Expanded and clarified the documentation for RFdiffusion3 input specification, including detailed explanations of CLI arguments, InputSpecification fields, the InputSelection mini-language, contig string formatting, and advanced options such as partial diffusion and CIF parser arguments. Added more examples, debugging recommendations, and an updated FAQ. Also updated the output file naming explanation for clarity. Removed the obsolete configuration_options.md file. Note that images are still not being rendered correctly in many of the md files. Fix will be in future commit. * PPI tutorial and RFD3 docs update Created output files for PPI tutorial and listed their locations. Made edits to files to add labels to sections to remove sphinx warnings. * Delete docs/source/conf.py~ This is an auto-save file from emacs - it does not need to be in the repo. * Delete docs/source/index.rst~ This is an auto-saved copy of index.rst, it does not need to be in the repo * Adding missing images and fixing docs symlink * Fix grammatical error in RFdiffusion3 documentation Fixed typo, clarified enzyme design language. * Fix typos and enhance clarity in inference docs Corrected typos and improved clarity in the documentation regarding inference settings and file formats.
27
.github/workflows/documentation.yml
vendored
Normal file
@@ -0,0 +1,27 @@
|
||||
name: documentation
|
||||
|
||||
on: [push, pull_request, workflow_dispatch]
|
||||
|
||||
permissions:
|
||||
contents: write
|
||||
|
||||
jobs:
|
||||
docs:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/setup-python@v5
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
pip install sphinx myst_parser furo sphinx-copybutton
|
||||
- name: Sphinx build
|
||||
run: |
|
||||
sphinx-build -M html docs/source/ docs/build/
|
||||
- name: Deploy to GitHub Pages
|
||||
uses: peaceiris/actions-gh-pages@v3
|
||||
if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
|
||||
with:
|
||||
publish_branch: gh-pages
|
||||
github_token: ${{ secrets.GITHUB_TOKEN }}
|
||||
publish_dir: docs/build/html
|
||||
force_orphan: true
|
||||
20
docs/Makefile
Normal file
@@ -0,0 +1,20 @@
|
||||
# Minimal makefile for Sphinx documentation
|
||||
#
|
||||
|
||||
# You can set these variables from the command line, and also
|
||||
# from the environment for the first two.
|
||||
SPHINXOPTS ?=
|
||||
SPHINXBUILD ?= sphinx-build
|
||||
SOURCEDIR = source
|
||||
BUILDDIR = build
|
||||
|
||||
# Put it first so that "make" without argument is like "make help".
|
||||
help:
|
||||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
|
||||
|
||||
.PHONY: help Makefile
|
||||
|
||||
# Catch-all target: route all unknown targets to Sphinx using the new
|
||||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
|
||||
%: Makefile
|
||||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
|
||||
35
docs/make.bat
Normal file
@@ -0,0 +1,35 @@
|
||||
@ECHO OFF
|
||||
|
||||
pushd %~dp0
|
||||
|
||||
REM Command file for Sphinx documentation
|
||||
|
||||
if "%SPHINXBUILD%" == "" (
|
||||
set SPHINXBUILD=sphinx-build
|
||||
)
|
||||
set SOURCEDIR=source
|
||||
set BUILDDIR=build
|
||||
|
||||
%SPHINXBUILD% >NUL 2>NUL
|
||||
if errorlevel 9009 (
|
||||
echo.
|
||||
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
|
||||
echo.installed, then set the SPHINXBUILD environment variable to point
|
||||
echo.to the full path of the 'sphinx-build' executable. Alternatively you
|
||||
echo.may add the Sphinx directory to PATH.
|
||||
echo.
|
||||
echo.If you don't have Sphinx installed, grab it from
|
||||
echo.https://www.sphinx-doc.org/
|
||||
exit /b 1
|
||||
)
|
||||
|
||||
if "%1" == "" goto help
|
||||
|
||||
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
|
||||
goto end
|
||||
|
||||
:help
|
||||
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
|
||||
|
||||
:end
|
||||
popd
|
||||
59
docs/source/conf.py
Normal file
@@ -0,0 +1,59 @@
|
||||
# Configuration file for the Sphinx documentation builder.
|
||||
#
|
||||
# For the full list of built-in configuration values, see the documentation:
|
||||
# https://www.sphinx-doc.org/en/master/usage/configuration.html
|
||||
|
||||
# -- Project information -----------------------------------------------------
|
||||
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
|
||||
|
||||
project = 'foundry'
|
||||
copyright = '2025, Institute for Protein Design, University of Washington'
|
||||
author = 'Institute for Protein Design, University of Washington'
|
||||
release = '0.1.7'
|
||||
|
||||
# -- General configuration ---------------------------------------------------
|
||||
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
|
||||
|
||||
extensions = ["myst_parser",
|
||||
"sphinx_copybutton"
|
||||
]
|
||||
|
||||
templates_path = ['_templates']
|
||||
exclude_patterns = ["readme.md", "readmelink.md", "readme_link.rst"]
|
||||
|
||||
|
||||
|
||||
# -- Options for HTML output -------------------------------------------------
|
||||
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
|
||||
|
||||
html_theme = 'furo'
|
||||
html_static_path = ['_static', '../../models/rfd3/.assets']
|
||||
|
||||
html_theme_options = {
|
||||
"sidebar_hide_name":False,
|
||||
#"announcement": "<em>THIS DOCUMENTATION IS CURRENTLY UNDER CONSTRUCTION</em>",
|
||||
"light_css_variables": {
|
||||
"color-brand-primary": "#F68A33", # Rosetta Teal
|
||||
"color-brand-content": "#37939B", # Rosetta Orange
|
||||
#"color-admonition-background": "#CCE8E8", # Rosetta light orange
|
||||
"font-stack": "Open Sans, sans-serif",
|
||||
"font-stack--headings": "Open Sans, sans-serif",
|
||||
"color-background-hover": "#DCE8E8ff",
|
||||
"color-announcement-background" : "#F68A33dd",
|
||||
"color-announcement-text": "#070707",
|
||||
"color-brand-visited": "#37939B",
|
||||
},
|
||||
"dark_css_variables": {
|
||||
"color-brand-primary": "#37939B", # Rosetta teal
|
||||
"color-brand-content": "#F68A33", # Rosetta orange
|
||||
#"color-admonition-background": "#20565B", # Rosetta light orange
|
||||
"font-stack": "Open Sans, sans-serif",
|
||||
"font-stack--headings": "Open Sans, sans-serif",
|
||||
"color-brand-visited": "#37939B",
|
||||
}
|
||||
}
|
||||
|
||||
source_suffix = {
|
||||
".rst": "restructuredtext",
|
||||
".md": "markdown",
|
||||
}
|
||||
22
docs/source/index.rst
Normal file
@@ -0,0 +1,22 @@
|
||||
.. foundry documentation master file, created by
|
||||
sphinx-quickstart on Wed Dec 17 16:36:38 2025.
|
||||
You can adapt this file completely to your liking, but it should at least
|
||||
contain the root `toctree` directive.
|
||||
|
||||
Welcome to the official documentation for foundry
|
||||
=================================================
|
||||
|
||||
`foundry <https://github.com/RosettaCommons/foundry/tree/production>`_ is a home for
|
||||
many of the machine learning models produced by `Rosetta Commons member labs <https://rosettacommons.org/about/labs/>`_.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
:caption: General
|
||||
|
||||
license_link.rst
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
:caption: Models
|
||||
|
||||
models/rfd3/index
|
||||
5
docs/source/license_link.rst
Normal file
@@ -0,0 +1,5 @@
|
||||
LICENSE
|
||||
=======
|
||||
|
||||
.. include:: ../../LICENSE.md
|
||||
:parser: myst_parser.sphinx_
|
||||
1
docs/source/models/rfd3
Symbolic link
@@ -0,0 +1 @@
|
||||
../../../models/rfd3/docs
|
||||
5
docs/source/readme_link.rst
Normal file
@@ -0,0 +1,5 @@
|
||||
README
|
||||
======
|
||||
|
||||
.. include:: ../../README.md
|
||||
:parser: myst_parser.sphinx_
|
||||
BIN
models/.DS_Store
vendored
Normal file
BIN
models/rfd3/.DS_Store
vendored
Normal file
BIN
models/rfd3/.assets/dna.png
Normal file
|
After Width: | Height: | Size: 149 KiB |
BIN
models/rfd3/.assets/enzyme.png
Normal file
|
After Width: | Height: | Size: 142 KiB |
BIN
models/rfd3/.assets/ppi.png
Normal file
|
After Width: | Height: | Size: 161 KiB |
BIN
models/rfd3/.assets/sm.png
Normal file
|
After Width: | Height: | Size: 164 KiB |
BIN
models/rfd3/.assets/symm.png
Normal file
|
After Width: | Height: | Size: 592 KiB |
BIN
models/rfd3/.assets/trajectory.png
Normal file
|
After Width: | Height: | Size: 404 KiB |
BIN
models/rfd3/docs/.DS_Store
vendored
Normal file
BIN
models/rfd3/docs/.assets/ppi_tutorial/cropped_vs_full.png
Normal file
|
After Width: | Height: | Size: 572 KiB |
|
After Width: | Height: | Size: 389 KiB |
BIN
models/rfd3/docs/.assets/ppi_tutorial/example_outputs.png
Normal file
|
After Width: | Height: | Size: 514 KiB |
BIN
models/rfd3/docs/.assets/ppi_tutorial/hotspots.png
Normal file
|
After Width: | Height: | Size: 248 KiB |
33
models/rfd3/docs/index.rst
Normal file
@@ -0,0 +1,33 @@
|
||||
RFdiffusion3 Documentation
|
||||
==========================
|
||||
|
||||
RFdiffusion3 is a powerful protein design tool that operates on the atomic level to
|
||||
study ligand-protein interactions, create nucleic acid-protein interfaces, and
|
||||
design *de novo* enzymes. It is designed to be highly flexible and
|
||||
user-friendly, making it suitable for a wide range of applications in computational biology and biochemistry.
|
||||
|
||||
General
|
||||
---------
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
intro_inference_calculations.md
|
||||
input.md
|
||||
|
||||
Tutorials
|
||||
---------
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
ppi_design_tutorial.md
|
||||
|
||||
Examples
|
||||
--------
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
na_binder_design.md
|
||||
sm_binder_design.md
|
||||
protein_binder_design.md
|
||||
symmetry.md
|
||||
enzyme_design.md
|
||||
@@ -1,19 +1,35 @@
|
||||
# RFdiffusion3 — Input Specification & Command-line arguments
|
||||
|
||||
RFdiffusion3 accepts inputs in two forms:
|
||||
- Constrains to be applied to the inference run are given in JSON or YAML files
|
||||
- Details about the job (number of designs, output directory, etc.) are given as command line arguments
|
||||
|
||||
This document outlines the various input settings and configurations you can use with RFdiffusion3.
|
||||
|
||||
---
|
||||
|
||||
## Contents
|
||||
- [Quick start](#quick-start)
|
||||
- [CLI arguments](#cli-arguments)
|
||||
- [Required CLI Arguments](#required-cli-arguments)
|
||||
- [Other Useful CLI Arguments](#other-useful-cli-arguments)
|
||||
- [Other CLI options](#other-CLI-options)
|
||||
- [InputSpecification fields](#inputspecification-fields)
|
||||
- [The `InputSelection` mini-language](#the-inputselection-mini-language)
|
||||
- [Unindexing specifics](#unindexing-specifics)
|
||||
- [Partial diffusion](#partial-diffusion)
|
||||
- [Contig Strings](#contig-strings)
|
||||
- [Input Option Specifics](#input-option-specifics)
|
||||
- [Unindexing Specifics](#unindexing-specifics)
|
||||
- [Partial Diffusion](#partial-diffusion)
|
||||
- [CIF Parser Options](#cif-parser-options)
|
||||
- [Select Fixed Atoms](#select-fixed-atoms)
|
||||
- [Debugging recommendations](#debugging-recommendations)
|
||||
- [FAQ / gotchas](#faq--gotchas)
|
||||
- [FAQ / Gotchas](#faq--gotchas)
|
||||
|
||||
---
|
||||
|
||||
(quick-start)=
|
||||
## Quick start
|
||||
> For more detailed information on RFdiffusion3 inputs and outputs, see {doc}`intro_inference_calculations`
|
||||
|
||||
JSON inputs take the following top-level structure;
|
||||
```json
|
||||
@@ -31,57 +47,105 @@ JSON inputs take the following top-level structure;
|
||||
```
|
||||
|
||||
You can then run inference at the command line with:
|
||||
```
|
||||
```bash
|
||||
rfd3 design out_dir=<path/to/outdir> inputs=<path/to/inputs>
|
||||
```
|
||||
In this document, we detail the syntax of the config structure.
|
||||
|
||||
(cli-arguments)=
|
||||
## CLI arguments
|
||||
Key CLI arguments (from the default config) to know include:
|
||||
- `n_batches` — number of batches to generate per input key (default: 1).
|
||||
- `diffusion_batch_size` — number of diffusion samples per batch (default: 8).
|
||||
|
||||
(required-cli-arguments)=
|
||||
### Required CLI arguments:
|
||||
- `out_dir` — The directory that output files from the inference run will be stored in. If the directory does not exist it will be created. **This does not change how the output files are named.**
|
||||
- `inputs` — The path and file name of the JSON or YAML file where you have defined your inference constraints.
|
||||
|
||||
(other-useful-cli-arguments)=
|
||||
### Other Useful CLI arguments:
|
||||
(From the [default config](https://github.com/RosettaCommons/foundry/blob/production/models/rfd3/configs/inference_engine/rfdiffusion3.yaml))
|
||||
- `n_batches` — number of batches to generate per input key (default: 1).
|
||||
- `diffusion_batch_size` — number of diffusion samples (designs) per batch (default: 8). If `n_batches=1` and `diffusion_batch_size=8` then 8 designs will be generated from the inference run.
|
||||
- `specification` — JSON overrides for the per-example InputSpecification (default: `{}`). For example, you can run `rfd3 design inputs=null specification.length=200` for a quick debug of creating a 200-length protein.
|
||||
- `inference_sampler.num_timesteps` — diffusion timesteps for sampling (default: 200).
|
||||
- `inference_sampler.step_scale` — scales diffusion step size; higher → less diverse, more designable (default: 1.5).
|
||||
- `low_memory_mode` — memory-efficient tokenization mode; set `True` if GPU RAM is tight (default: False).
|
||||
- `ckpt_path` — String containing he path and file name of the checkpoint path you want to use (default: rfd3)
|
||||
- `skip_existing` — Skip designing any systems whose output files already exist in the specified `out_dir` (default: True).
|
||||
- `global_prefix` — This setting allows you to change the beginning of the name of the output files from the name of the input JSON or YAML file to your own string (default: null).
|
||||
- `dump_trajectories` — If True, the trajectory files are also saved to the specified output directory (default: False).
|
||||
- `prevalidate_inputs` — Check that your inputs (JSON or YAML file) are valid before running inference (default: False).
|
||||
- `low_memory_mode` - Set to True (default: False) for memory efficient tokenization mode.
|
||||
|
||||
The full config of default arguments that are applied can be seen in [inference_engine/rfdiffusion3.yaml](../configs/inference_engine/rfdiffusion3.yaml)
|
||||
(other-cli-options)=
|
||||
### Other CLI Options:
|
||||
- `json_keys_subset` — Allows the user to extract only a subset of the JSON keys provided in the `inputs` file (default: null).
|
||||
- `inference_sampler` —
|
||||
- `kind` — Change this value to `symmetry` (default: default) to turn on symmetry mode for the inference sampler.
|
||||
- `cfg_features` — The values specified (options are `active donor, active_acceptor, or ref_atomwise_rasa`) are set to 0 for classifier-free guidance. Classifier-free guidance is how the diffusion model can steer the calculation towards a condition without training a separate classifier.
|
||||
- `use_classifier_free_guidance` — If set to `True`, RFD3 can use classifier-free guidance to guide the system towards a condition without training a separate classifier (default: `Fasle`).
|
||||
- `cfg_t_max` — The maximum time to apply classifier-free guidance to the inference run (default: null).
|
||||
- `cfg_scale` — Controls the influence of the classifier-free guidance adjustment (default: 1.5).
|
||||
- `center_option`: Specifies how to center the coordinates during the inference run to ensure that structures are alined around a specific point. Options include:
|
||||
- `all` — (default) Uses the center of mass (COM) of all atoms
|
||||
- `motif` — Uses the COM of the motif atoms with fixed coordinates
|
||||
- `diffuse` — Uses the COM of all fixed coordinates that are not part of motif atoms
|
||||
- `s_trans` — Translational noise scale for augmentation during inference (default: 1.0).
|
||||
<!-- `inference_noise_scaling_factor` As far as I can tell there isn't actually any code to make use of this setting -->
|
||||
- `allow realignment` — If set to `True` (default: False) then the noised structure can be realigned during inference based on the location of a given motif.
|
||||
- `noise_scale` — This parameter sets the scaling for the noise during inference (default 1.003). A smaller value will lead to less noise in your system leading to less diversity in the outputs.
|
||||
- `p` — Determines the 'shape' of the noise schedule (default: 7).
|
||||
- `gamma_0` — This value (default: 0.6) influences the diversity of the designs from RFD3. A lower value increases designability but decreases diversity.
|
||||
- `gamma_min` — Controls when `gamma_0` is used, if `t>gamma_min`, `gamma_0` is used as the value of `gamma`, which influences the diversity of the designs from RFD3.
|
||||
- `s_jitter_origin` — Controls the standard deviation of the Gaussian distribution that is used to 'jitter' the motif offset (default: 0.0, no jitter).
|
||||
- `cleanup_guideposts` — Set to `False` (default: True) to save the guideposts used during inference, see [Debugging recommendations](#debugging-recommendations) for more information.
|
||||
- `cleanup_virtual_atoms` — Set to `False` (default: True) to save information about the diffused virtual atoms used during inference. RFD3 uses virtual atoms to account for the different number of atoms in side chains during the design process. RFD3 is atom based, however the number of atoms in a residue will differ based on its side chain, which is only determined after some diffusion steps have occurred, meaning virtual atoms are necessary for those steps. See [Debugging recommendations](#debugging-recommendations) for more information.
|
||||
- `read_sequence_from_sequence_head` — Used during training, it is not recommended to change this setting (default: True).
|
||||
- `output_full_json` — Output all specification information to the JSON file that gets created for each design (default: True).
|
||||
- `dump_prediction_metadata_json` — If `True`, the metadata for the inference run will be included in the output JSON file (default: True).
|
||||
- `align_trajectory_structures` — Aligns the structures in the output trajectories (default: False).
|
||||
|
||||
|
||||
The full config of default arguments that are applied can be seen in [inference_engine/rfdiffusion3.yaml](https://github.com/RosettaCommons/foundry/blob/production/models/rfd3/configs/inference_engine/rfdiffusion3.yaml)
|
||||
|
||||
(inputspecification-fields)=
|
||||
## InputSpecification fields
|
||||
|
||||
Below is a table of all of the inputs that the `InputSpecification` accepts. Use these fields to describe what RFdiffusion3 should do with your inputs.
|
||||
Below is a table of all of the inputs that the `InputSpecification` accepts. Use these fields to describe the constraints you want to apply to your system during inference.
|
||||
|
||||
> For the fields with the `InputSelection` type, see section [The InputSelection Mini-Language](#the-inputselection-mini-language).
|
||||
|
||||
> Many of the settings here will mention a 'contig string', see the [Contig Strings](#contig-strings) section for more details.
|
||||
|
||||
|
||||
| Field | Type | Description |
|
||||
| -------------------------------------------------------------- | ----------------- | --------------------------------------------------------------------- |
|
||||
| `input` | `str?` | Path to input **PDB/CIF**. Required if you provide contig+length. |
|
||||
| `atom_array_input` | internal | Pre-loaded `AtomArray` (not recommended). |
|
||||
| `contig` | `InputSelection?` | Indexed motif specification, e.g., `"A1-80,10,\0,B5-12"`. |
|
||||
| `unindex` | `InputSelection?` | Unindexed motif components (unknown sequence placement). |
|
||||
| `length` | `str?` | Total design length constraint; `"min-max"` or int. |
|
||||
| `ligand` | `str?` | Ligand(s) by resname or index. |
|
||||
| `cif_parser_args` | `dict?` | Optional args to CIF loader. |
|
||||
| `extra` | `dict` | Extra metadata (e.g., logs). |
|
||||
| `dialect` | `int` | `2`=new (default), `1`=legacy. |
|
||||
| `select_fixed_atoms` | `InputSelection?` | Atoms with fixed coordinates. |
|
||||
| `select_unfixed_sequence` | `InputSelection?` | Where sequence can change. |
|
||||
| `select_buried` / `select_partially_buried` / `select_exposed` | `InputSelection?` | RASA bins 0/1/2 (mutually exclusive). |
|
||||
| `select_hbond_donor` / `select_hbond_acceptor` | `InputSelection?` | Atom-wise donor/acceptor flags. |
|
||||
| `select_hotspots` | `InputSelection?` | Atom-level or token-level hotspots. |
|
||||
| `redesign_motif_sidechains` | `bool` | Fixed backbone, redesigned sidechains for motifs. |
|
||||
| `symmetry` | `SymmetryConfig?` | See `docs/symmetry.md`. |
|
||||
| `ori_token` | `list[float]?` | `[x,y,z]` origin override to control COM placement |
|
||||
| `infer_ori_strategy` | `str?` | `"com"` or `"hotspots"`. |
|
||||
| `plddt_enhanced` | `bool` | Default `true`. |
|
||||
| `is_non_loopy` | `bool` | Default `true`. |
|
||||
| `partial_t` | `float?` | Noise (Å) for partial diffusion, enables partial diffusion |
|
||||
| `input` | `str` | Path to and file name of **PDB/CIF**. Required if you provide contig+length. |
|
||||
| `atom_array_input` | internal | Pre-loaded [`AtomArray`](https://www.biotite-python.org/latest/apidoc/biotite.structure.AtomArray.html) (not recommended). |
|
||||
| `contig` | `InputSelection` | (Can only pass a contig string.) Indexed motif specification, e.g., `"A1-80,10,\0,B5-12"`. |
|
||||
| `unindex` | `InputSelection` | (Can only pass a contig string or dictionary.) Unindexed motif components, the specified residues can be anywhere in the final sequence. See [Unindexing Specifics](#unindexing-specifics) for more information. |
|
||||
| `length` | `str` | Total design length constraint; `"min-max"` or int for specified length. |
|
||||
| `ligand` | `str` | Ligand(s) by chemical component name (from [RSCB PDB](https://www.rcsb.org/)) or index. |
|
||||
| `cif_parser_args` | `dict` | Optional args to CIF loader. See [CIF parser options](#cif-parser-options) for more information. |
|
||||
| `extra` | `dict` | Extra metadata (e.g., logs). Current options include `sampled_contig`. |
|
||||
| `dialect` | `int` | `2`=new (default), `1`=legacy, Learn more about the legacy parsing system by looking at [input_parsing.py](https://github.com/RosettaCommons/foundry/blob/production/models/rfd3/src/rfd3/inference/input_parsing.py).|
|
||||
| `select_fixed_atoms` | `InputSelection` | Atoms with fixed coordinates. See the [Select Fixed Atoms](#select-fixed-atoms) subsection for more information. |
|
||||
| `select_unfixed_sequence` | `InputSelection` | Where sequence can change. Default is `True` - all input regions have fixed sequences. Contig string input specifies components to unfix the sequence for. Dictionary inputs are allowed but not recommended.|
|
||||
| `select_buried` / `select_partially_buried` / `select_exposed` | `InputSelection` | Selection of RASA (Relatively Accessible Surface Area) for buried, partially buried, and exposed conditioning, respectively. Only contig string and dictionary are acceptable inputs. |
|
||||
| `select_hbond_donor` / `select_hbond_acceptor` | `InputSelection` | Atom-wise donor/acceptor flags. Atom-wise selection of hydrogen bond donors and acceptors, respectively. Only dictionary inputs allowed. See {doc}`na_binder_design` for an example. |
|
||||
| `select_hotspots` | `InputSelection` | Atom-level or residue-level hotspots. Hotspots will typically be at most 4.5 Å to any heavy atom in the designed structure. Typically used for designing binders. |
|
||||
| `redesign_motif_sidechains` | `bool` | Fixed backbone, redesigned sidechains for motifs (input structures). |
|
||||
| `symmetry` | `SymmetryConfig` | See {doc}`symmetry`. |
|
||||
| `ori_token` | `list[float]` | `[x,y,z]` origin override to control COM (center of mass) placement of designed structure. |
|
||||
| `infer_ori_strategy` | `str` | `"com"` or `"hotspots"`. The center of mass of the diffused region will typically be within 5Å of the ORI token. Using `hotspots` will place the ORI token 10Å outward from the center of mass of the specified hotspots. Using `com` will place the token at the center of mass of the input structure.|
|
||||
| `plddt_enhanced` | `bool` | Default `True`. Enables pLDDT (predicted Local Distance Difference Test) enhancement. |
|
||||
| `is_non_loopy` | `bool` | Default `True`. Produces output structures with fewer loops.|
|
||||
| `partial_t` | `float` | Noise (Å) for partial diffusion, enables partial diffusion (sets the noise level.) Recommended values are 5.0-15.0 Å. See [Partial Diffusion](#partial-diffusion) for more information. |
|
||||
|
||||
|
||||
A few notes on the above:
|
||||
- **Unified selections.** All per-residue/atom choices now use **InputSelection**:
|
||||
- You can pass `true`/`false`, a **contig string** (`"A1-10,B5-8"`), or a **dictionary** (`{"A1-10": "ALL", "B5": "N,CA,C,O"}`).
|
||||
- You can pass `True`/`False`, a **contig string** (`"A1-10,B5-8"`), or a **dictionary** (`{"A1-10": "ALL", "B5": "N,CA,C,O"}`).
|
||||
- Selection fields include: `select_fixed_atoms`, `select_unfixed_sequence`, `select_buried`, `select_partially_buried`, `select_exposed`, `select_hbond_donor`, `select_hbond_acceptor`, `select_hotspots`.
|
||||
- **Clearer unindexing.** For **unindexed** motifs you typically either fix `"ALL"` atoms or explicitly choose subsets such as `"TIP"`/`"BKBN"`/explicit atom lists via a **dictionary** (see examples).
|
||||
- **Clearer unindexing.** For **unindexed** motifs you typically either fix `"ALL"` atoms or explicitly choose subsets such as `"TIP"`/`"BKBN"`/explicit atom lists via a **dictionary** (see examples). (`"ALL"` = all atoms, `"TIP"` = tip atoms, `"BKBN"` = backbone atoms.)
|
||||
When using `unindex`, only **the atoms you mark as fixed** are carried over from the input.
|
||||
- **Reproducibility.** The exact specification and the **sampled contig** are logged back into the output JSON. We also log useful counts (atoms, residues, chains).
|
||||
- **Safer parsing.** You’ll now get early, informative errors if:
|
||||
@@ -92,7 +156,8 @@ A few notes on the above:
|
||||
- **Backwards compatible.** Add `"dialect": 1` to keep your old configs running while you migrate. (Deprecated.)
|
||||
|
||||
---
|
||||
## The InputSelection mini-language
|
||||
(the-inputselection-mini-language)=
|
||||
## The InputSelection Mini-Language
|
||||
|
||||
Fields marked as `InputSelection` accept either a boolean, a contig-style string, or a dictionary. Dictionaries are the most expressive and can also use shorthand values like `ALL`, `TIP`, or `BKBN`:
|
||||
```yaml
|
||||
@@ -104,21 +169,57 @@ select_fixed_atoms:
|
||||
LIG: '' # selects no atoms (i.e. unfixes the atoms for ligands named `LIG`)
|
||||
```
|
||||
|
||||
<p align="center">
|
||||
<!--<p align="center">
|
||||
<img src=".assets/input_selection.png" alt="InputSelection language for foundry" width=500>
|
||||
</p>
|
||||
</p>-->
|
||||
```{figure} .assets/input_selection.png
|
||||
---
|
||||
alt: Input selection language for foundry.
|
||||
width: 500px
|
||||
---
|
||||
Graphical representation of the different ways to specify portions of a structure using RFD3's InputSelection mini-language.
|
||||
```
|
||||
|
||||
## Unindexing specifics
|
||||
|
||||
(contig-strings)=
|
||||
## Contig Strings
|
||||
A 'contig string' is a string that contains residue information and is used in many of the settings in the table above. Here are some formatting specifics:
|
||||
- Different pieces of information included in the string are separated by commas
|
||||
- Ranges of residues are specified by a dash (`-`) between the starting and ending residue
|
||||
- Chain breaks are represented by `\0`
|
||||
- Residue numbers or ranges with a chain label before the number come from the input structure
|
||||
- Residue numbers or ranges without a chain label before the number will be designed. If given a range, the designed region will have a length that is uniformly random within the specified range.
|
||||
|
||||
For example:
|
||||
```yaml
|
||||
my_calculation:
|
||||
input: path/to/my/input.pdb
|
||||
contig: A40-60,70,A120-170,A203,\0,B3-45,60-80
|
||||
```
|
||||
- `A40-60`: the design will start with residues 40-60 from the A chain of the input structure.
|
||||
- `70`: RFD3 will design a chain with exactly 70 residues that will connect to A60
|
||||
- `A120-170`: RFD3 will include a bond between the last designed residue and residue A120, and then include residues A120-A170 from the input structure.
|
||||
- `A203`: A bond will be created between A170 and A203 and A203 will be in the final structure. However, residues A171-A202 will not be in the final structure.
|
||||
- `\0`: Chain break. There is no peptide bond between A203 and B3 in the output structure
|
||||
- `B3-B45`: Residues B3 thru B45 are taken from the input structure.
|
||||
- `60-80`: A design region is added B45 that will be between 60 and 80 residues long.
|
||||
|
||||
(input-option-specifics)=
|
||||
## Input Option Specifics
|
||||
|
||||
(unindexing-specifics)=
|
||||
### Unindexing Specifics
|
||||
|
||||
`unindex` marks motif tokens whose relative sequence placement is unknown to the model (useful for scaffolding around active sites, etc.).
|
||||
Use a string to list the unindexed components and where breaks occur.
|
||||
Use a dictionary if you want to fix specific atoms of those residues; atoms not fixed are not copied from the input (they will be diffused).
|
||||
Breaks between unindexed components follow the contig conventions you’re used to. For example: `"A244,A274,A320,A329,A375"` lists multiple unindexed components; internal “breakpoints” are inferred and logged. (Offset syntax like A11-12 or A11,0,A12 still ties residues.)
|
||||
Breaks between unindexed components follow the contig conventions you’re used to. For example: `"A244,A274,A320,A329,A375"` lists multiple unindexed components; internal “breakpoints” are inferred and logged. (Offset syntax like `A11-12` or `A11,0,A12` still ties residues.)
|
||||
You can specify consecutive residues as e.g. `A11-12` (instead of `A11,A12`), this will tie the two components together in sequence (or at least it leaks to the model that residues are together in sequence).
|
||||
Similarly, you can specify manually any number of residues that offsets two components, e.g. `A11,0,A12` (0 sequence offset, equivalent to just `A11-12`), or `A11,3,A12` (3-residue separation).
|
||||
From our initial tests this only leads to a slight bias in the model, but newer models may show better adherence!
|
||||
|
||||
## Partial Diffusion
|
||||
(partial-diffusion)=
|
||||
### Partial Diffusion
|
||||
To enable partial diffusion, you can pass `partial_t` with any example. This sets the *noise level* in *angstroms* for the sampler:
|
||||
- The `specification.partial_t` argument can be specified from JSON or the command line.
|
||||
- Partial diffusion will fix/unfix ligands and nucleic acids as normal, by default it will fix non-protein components and they must be specified explicitly.
|
||||
@@ -142,45 +243,78 @@ In the following example, RFD3 will noise out by 15 angstroms and constrain atom
|
||||
}
|
||||
```
|
||||
Below is an example of what the output should look like (diffusion outputs in teal, original native in navajo white):
|
||||
<p align="center">
|
||||
<!--<p align="center">
|
||||
<img src=".assets/partial_diff.png" alt="Partial diffusion" width=650>
|
||||
</p>
|
||||
</p>-->
|
||||
```{image} .assets/partial_diff.png
|
||||
:alt: Partial diffusion.
|
||||
:width: 650px
|
||||
```
|
||||
|
||||
(cif-parser-options)=
|
||||
### CIF Parser Options
|
||||
The `cif_parser_args` setting that you can include in your input JSON or YAML file accepts several possible values as a dictionary:
|
||||
- `cache_dir`: String specifying the path to the directory where cache files are stored (default: null).
|
||||
- `load_from_cache`: Boolean specifying if data should be loaded from cache (default: True).
|
||||
- `save_to_cache`: Boolean specifying if the data should be saved to cache (default: True).
|
||||
- `fix_arginines`: Boolean specifying if arginine residues should be fixed (default: False).
|
||||
- `add_missing_atoms`: Boolean specifying if missing atoms should be automatically added (default: False).
|
||||
- `remove_ccds`: A list of CCD ([chemical component dictionary](https://www.wwpdb.org/data/ccd)) keys to remove (default: []).
|
||||
- `hydrogen_policy`: String specifying how hydrogens should be handled. Current options are `remove`. (Default: remove).
|
||||
- `extra_fields`: These optional fields can be found by looking at AtomWorks' [`parser.py` file](https://github.com/RosettaCommons/atomworks/blob/production/src/atomworks/io/parser.py).
|
||||
|
||||
You can also use `STANDARD_PARSER_ARGS` from [AtomWorks](https://github.com/RosettaCommons/atomworks), more information can be found at [atomworks/io/parser.py](https://github.com/RosettaCommons/atomworks/blob/production/src/atomworks/io/parser.py)
|
||||
|
||||
(select-fixed-atoms)=
|
||||
### Select Fixed Atoms
|
||||
The `select_fixed_atoms` input setting can take a boolean, dictionary or contig string as input:
|
||||
- `True`: All atoms pulled from the input file (via `contig`, for example) are fixed in 3D space
|
||||
- `False`: All the atoms pulled from the input file are unfixed in 3D space
|
||||
- Contig string: See the [Contig Strings](#contig-strings) section for formatting. Specifying a contig string for this setting allows for the specification of several components to fix in 3D space. This string should only reference residues from the input. Chain breaks are irrelevant for this setting.
|
||||
- Dictionary: Allows for the specification of specific atoms within the residue to be fixed in 3D space. For example, `{"A1": "N,CA,C,O,CB,CG", "A2-10": "BKBN"}` fixes backbone and CB for residues 1 and 2, and all atoms for residues 3-10 in chain A.
|
||||
|
||||
(debugging-recommendations)=
|
||||
## Debugging recommendations
|
||||
- For unindexed scaffolding, you can use the option `cleanup_guideposts=False` to keep the models' outputs for the guideposts. The guideposts are saved as separate chains based on whether their relative indices were leaked to the model: e.g. for `unindex=A11-12,A22`, you should see `A11` and `A12` indexed together on one chain and `A22` on its own chain, indicating the model was provided with the fact that `A11` and `A12` are immediately next to one another in sequence but their distance to `A22` is unknown.
|
||||
- To see the full 14 diffused virtual atoms you can use `cleanup_virtual_atoms=False`. Default is to discard them for the sake of downstream processing.
|
||||
- To see the trajectories, you can use `dump_trajectories=True`. This can be useful if the outputs look strange but the config is correct, or if you want to make cool gifs of course! Trajectories do not have sequence labels and contain virtual atoms.
|
||||
|
||||
## FAQ / gotchas
|
||||
(faq--gotchas)=
|
||||
## FAQ / Gotchas
|
||||
|
||||
<details>
|
||||
<details>
|
||||
<summary><b>Can I guide on secondary structure?</b></summary>
|
||||
Currently no - in future models we may do so, however, you can use `is_non_loopy: true` to make fewer loops. We find this produces a lot more helices and fewer loops (and less sheets).
|
||||
</details>
|
||||
|
||||
<summary><b>Do I need select_fixed_atoms & select_unfixed_sequence every time?</b></summary>
|
||||
|
||||
No. Defaults apply when input present.
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary><b>Why "Input provided but unused"?</b></summary>
|
||||
|
||||
This indicates you gave an input pdb / cif (not `input: null`) but no contig, unindex, ligand or partial_t.
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary><b>What do the logged bfactors mean?</b></summary>
|
||||
|
||||
The sequence head from RFD3 logs its confidence for each token in the output structure, you can run `spectrum b` in `pymol` to see it. It usually doesn't mean anything but can give you some idea if the model has gone vastly distribution if the entropy is high (uncertain assignment of sequence).
|
||||
</details>
|
||||
<summary><b>Can I guide on secondary structure?</b></summary>
|
||||
Currently no - in future models we may do so, however, you can use `is_non_loopy: true` to make fewer loops. We find this produces a lot more helices and fewer loops (and less sheets).
|
||||
</details>
|
||||
|
||||
Let us know if you have any additional questions, we'd be happy to answer them!
|
||||
<details>
|
||||
<summary><b>Do I need select_fixed_atoms & select_unfixed_sequence every time?</b></summary>
|
||||
No. Defaults apply when input present.
|
||||
</details>
|
||||
|
||||
|
||||
<details>
|
||||
<summary><b>Why "Input provided but unused"?</b></summary>
|
||||
|
||||
This indicates you gave an input pdb / cif (not `input: null`) but no contig, unindex, ligand, and/or partial_t.
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary><b>What do the logged bfactors mean?</b></summary>
|
||||
|
||||
The sequence head from RFD3 logs its confidence for each token in the output structure, you can run `spectrum b` in `pymol` to see it. It usually doesn't mean anything but can give you some idea if the model has gone vastly distribution if the entropy is high (uncertain assignment of sequence).
|
||||
</details>
|
||||
|
||||
|
||||
Let us know if you have any additional questions, we'd be happy to answer them either in our [Slack channel](https://join.slack.com/t/proteinmodelfoundry/shared_invite/zt-3kpwru8c6-nrmTW6LNHnSE7h16GNnfLA) or in a GitHub discussion.
|
||||
|
||||
## Further examples of InputSelection syntax
|
||||
|
||||
Below is a reference for more examples of different ways you can specify inputs to select from your pdb in configs; we hope the community can find use in this flexible system for future models!
|
||||
<p align="center">
|
||||
<!--<p align="center">
|
||||
<img src=".assets/input_selection_large.png" alt="Input selection syntax" width=650>
|
||||
</p>
|
||||
</p>-->
|
||||
```{image} .assets/input_selection_large.png
|
||||
:alt: Input selection syntax.
|
||||
:width: 650px
|
||||
```
|
||||
|
||||
39
models/rfd3/docs/intro_inference_calculations.md
Normal file
@@ -0,0 +1,39 @@
|
||||
# Inference Calculation Basics
|
||||
In RFdiffusion3 (RFD3), [YAML](https://yaml.org/) or [JSON](https://www.json.org/json-en.html) files are used to specify the **settings** for your inference calculations and [**configuration options**](https://hydra.cc/docs/configure_hydra/intro/) are used to provide other information about your calculation, such as the location and name of the checkpoint file you want to use.
|
||||
|
||||
## Inference Settings
|
||||
The inference 'settings' are how you constrain your inference calculation, such as specifying portions of the output you wish to have designed (`contig`) and specifying any symmetries that exist in your system (`symmetry`). These settings are stored in either a YAML or JSON file to be interpreted by RFdiffusion3. Runnable example sof json and yaml files can be found in `foundry/models/rfd3/docs`.
|
||||
|
||||
Using this type of input specification allows you to define different types of inference calculations all in the same file, and either run all of the calculation types defined in the file or specify the specific calculation you want to run via the command line.
|
||||
|
||||
```{note}
|
||||
For more information on many of the available options, see {doc}`input`. To see all available options, see [input_parsing.py](https://github.com/RosettaCommons/foundry/blob/production/models/rfd3/src/rfd3/inference/input_parsing.py).
|
||||
```
|
||||
|
||||
## Job configurations
|
||||
Once you have all of the settings you want to use to constrain your inference run in a JSON or YAML file, you can run the job using a command starting with `rfd3 design` and then including different 'configuration options'. You must include the path to the YAML/JSON file that defines your inference run(s) and the output directory:
|
||||
```bash
|
||||
rfd3 design inputs=/path/to/your/yaml/or/json/file out_dir=/path/to/your/output/directory ckpt_path=/path/to/an/rfd3_checkpoint_file.pt
|
||||
```
|
||||
|
||||
```{note}
|
||||
The output directory location specified will be created if it does not exist. This setting only specifies the location the output files will be stored in, not the naming of the various output files.
|
||||
```
|
||||
|
||||
Several options are available to you as well to control the number of designs, whether to save the trajectory files, etc. These options can be found in [`foundry/models/rfd3/configs/inference_engine/base.yaml`](https://github.com/RosettaCommons/foundry/blob/production/models/rfd3/configs/inference_engine/base.yaml) and [`foundry/models/rfd3/configs/inference_engine/rfdiffusion3.yaml`](https://github.com/RosettaCommons/foundry/blob/production/models/rfd3/configs/inference_engine/rfdiffusion3.yaml)
|
||||
|
||||
## Output Files
|
||||
At the end of your inference calculation, you will be left with several output files in the directory you specified. At minimum (if you did not change any settings to include more outputs) you will be left with a JSON and a compressed CIF file (`.cif.gz`) for each design. The names of the files will be as follows:
|
||||
```bash
|
||||
<name of the json or yaml file>_<settings group name>_<batch_number>_model_n.<suffix>
|
||||
```
|
||||
Where `n` is the design number, the numbering for the designs will start at 0.
|
||||
|
||||
For an example, if I called the my JSON file `rfd3_example.json`, only ran one batch, and had a group of settings in it labeled `example_1` I would get files with names like:
|
||||
```bash
|
||||
rfd3_example_example_1_0_model_0.cif.gz
|
||||
rfd3_example_example_1_0_model_0.json
|
||||
rfd3_example_example_1_0_model_1.cif.gz
|
||||
rfd3_example_example_1_0_model_1.json
|
||||
...
|
||||
```
|
||||
148
models/rfd3/docs/ppi_design_tutorial.md
Normal file
@@ -0,0 +1,148 @@
|
||||
# Protein-Protein Interface Design in RFdiffusion3
|
||||
|
||||
## Before We Get Started...
|
||||
This tutorial does not cover installing RFD3, before continuing you should make sure that RFdiffusion3 (RFD3) is installed and able to be run on your system.
|
||||
|
||||
See the [README](https://github.com/RosettaCommons/foundry/tree/production/models/rfd3) for installation instructions. You will need to remember the path to the directory where you stored your checkpoint files.
|
||||
|
||||
```{note}
|
||||
The instructions below assume that you have installed RFD3 via the pip commands.
|
||||
You may need to slightly modify how you run the calculations based on your setup.
|
||||
```
|
||||
|
||||
Make sure you have activated any environments you used to install RFD3.
|
||||
|
||||
RFD3 runs best on GPUs. It is suggested to follow this tutorial on an interactive GPU node, if you have access to one.
|
||||
|
||||
You will need the file `4zxb_cropped.pdb`. This is provided in [`foundry/models/rfd3/docs/input_pdbs`](input_pdbs/4zxb_cropped.pdb). You can clone the [`foundry`](https://github.com/RosettaCommons/foundry) repository to easily access files related to this tutorial.
|
||||
|
||||
Lastly, we will be visualizing the outputs of the calculations presented in the tutorial using [PyMOL](https://pymol.org/). The visualization steps are completely optional, but if you would like to follow along you will need to have PyMOL installed.
|
||||
|
||||
(learning-objectives)=
|
||||
## Learning Objectives
|
||||
In this tutorial, we will design a binder for the human insulin receptor to explore the settings available in RFD3 that are useful in protein-protein interface (PPI) design.
|
||||
|
||||
(setup)=
|
||||
## Setup
|
||||
Create a directory named `rfd3_ppi_tutorial` and `cd` into it:
|
||||
```bash
|
||||
mkdir rfd3_ppi_tutorial && cd rfd3_ppi_tutorial
|
||||
```
|
||||
This is where you will be storing the files related to this tutorial.
|
||||
|
||||
If you would like to compare your outputs against those generated by the authors of this tutorial, you can find pre-generated output files in `foundry/models/rfd3/docs/ppi_tutorial_files`
|
||||
The 'basic' zip file contains outputs that did not use the setting discussed in [Other Useful Settings](#other-useful-settings) section. The 'fixed' zip file has the outputs resulting from using the `select_fixed_atoms` option.
|
||||
|
||||
There is also an already made YAML file available in `foundry/models/rfd3/docs/ppi_tutorial_files`. We recommend following the tutorial to create this file yourself to better understand the RFD3 options that are relevant to PPI design.
|
||||
|
||||
(creating-the-yaml-file)=
|
||||
## Creating the YAML file
|
||||
In this tutorial, we will be briefly describing each of the settings we will be using for this example binder design project.
|
||||
|
||||
1. Using your editor of choice, open a new file called `rfd3_ppi_tutorial.yaml`. This is where we will be storing all of the settings that tell RFD3 the type of designs we would like to make.
|
||||
1. Our calculation needs a name. For this tutorial, we will only be including one example calculation, but your YAML file could have several. A name allows you (and RFD3) to differentiate them. Since we are designing binders for the human insulin receptor, let's just call it `insulinr`:
|
||||
```yaml
|
||||
insulinr:
|
||||
```
|
||||
Everything that comes after this should be indented to show that it's part of this `insulinr` calculation. You will want to use spaces, not the tab character. If a tab character (`\t `) is found in the file, RFD3 will crash.
|
||||
1. Tell RFD3 where to find your input file:
|
||||
```yaml
|
||||
input: /path/to/rfd3_ppi_tutorial/4zxb_cropped.pdb
|
||||
```
|
||||
This file was directly cropped from the 4zxb structure that can be found in the [RSCB PDB library](https://www.rcsb.org/). *If you visualize the cropped structure against the full one from the RSCB library, they may not appear to be exactly the same structure. However, if you align the two you will get an RMSD of 0.0.*
|
||||
<!-- ```{figure} .assets/ppi_tutorial/cropped_vs_full.png
|
||||
:width: 60%
|
||||
|
||||
Cropped vs. full structure of 4XZB with the RMSD of their alignment shown. THe cropped structure is in green while the full structure is shown in purple.
|
||||
``` -->
|
||||
1. The `contig` string is the main way you can tell RFD3 what portions of your input structure you want defined and what portions you want preserved from your input structure.
|
||||
```yaml
|
||||
contig: 40-120,/0,E6-155
|
||||
```
|
||||
The different sections of the `contig` string are separated by commas. Here's what each section is telling RFD3:
|
||||
- `40-120` specifies that we want RFD3 to design a new peptide chain that is between 40 and 120 residues long
|
||||
- `/0` is how a chain break is specified in RFD3.
|
||||
- `E6-155` is the portion of the input structure that we are keeping in our final output. The letter corresponds to the chain label in the input PDB and the starting and ending residue are included in the final structure. If you do not include the chain label, then RFD3 would just design a peptide chain between 6 and 155 residues in length.
|
||||
1. We can also specify the overall number of residues in our final structure:
|
||||
```yaml
|
||||
length: 190-270
|
||||
```
|
||||
This is not absolutely necessary for this calculation as we only have one designed portion of our structure. Our `contig` string already enforces that the length of the final structure is between 190 and 270 residues long as the portion of the input structure we are using is 150 residues long. However, this becomes important when you have several designed sections of your protein that can have random lengths.
|
||||
1. Specifying 'hotspots' in your structure is a way to tell RFD3 which portions of your input structure should be close to the designed binder. More specifically, RFD3 was trained to produce structures where the hotspots will typically be at most 4.5 Å to any heavy atom on the binder. Typically the hotspot residues are a subset of the residues that are important to the function of the protein, e.g. the catalytic residues. Choosing these residues will require some scientific intuition, a thorough literature search, and some experimenting.
|
||||
|
||||
Hotspots are specified by naming both the residue and the specific atoms within the residue that you want closest to the designed structure:
|
||||
```yaml
|
||||
select_hotspots:
|
||||
E64: CD2,CZ
|
||||
E88: CG,CZ
|
||||
E96: CD1,CZ
|
||||
```
|
||||
```{figure} .assets/ppi_tutorial/hotspots.png
|
||||
:width: 60%
|
||||
|
||||
The hotspot residues along with the specific target atoms circled in yellow.
|
||||
```
|
||||
1. Next we need to add information about our ORI token, this token specifies where we want the center of mass of our designed protein to be. Unless you know where you want to place the ORI token for your specific design needs, it is often easiest to have RFD3 infer the ORI placement based on the chosen `hotspots`:
|
||||
```yaml
|
||||
infer_ori_strategy: hotspots
|
||||
```
|
||||
This setting will place the ORI token 10Å outward from the center of mass of the `hotspots`. The center of mass of the diffused region will typically be within 5Å of the ORI token.
|
||||
1. There is a setting in RFD3, `is_non_loopy` that, if set to `true`, will cause fewer loops to be in your structure. It's recommended to use this setting for PPI design tasks in RFD3, let's add it to our YAML file:
|
||||
```yaml
|
||||
is_non_loopy: true
|
||||
```
|
||||
1. Save you file and close it.
|
||||
|
||||
(other-useful-settings)=
|
||||
### Other useful settings
|
||||
1. There is a setting for allowing structural flexibility while keeping the sequence fixed in the input structure, for example:
|
||||
```yaml
|
||||
select_fixed_atoms:
|
||||
E25: []
|
||||
E26: BKBN
|
||||
E27: CA,CB,OG
|
||||
```
|
||||
Here, an empty list indicates that all atoms are flexible, `BKBN` keeps the backbone atoms fixed while allowing side chain atoms to move, and for the last residue, specific atoms are fixed in place while allowing the others to move. Feel free to try adding this to your YAML file and see how your outputs change.
|
||||
|
||||
(running-rfd3)=
|
||||
## Running RFD3
|
||||
To actually run RFD3 you need to know:
|
||||
- the directory you want the outputs to be stored in
|
||||
- the path to the YAML (or JSON) file that stores the specific settings for the calculation
|
||||
- the location of your checkpoint files
|
||||
|
||||
Once you have these three things you can run something like this from the command line:
|
||||
```bash
|
||||
rfd3 design out_dir=ppi_tutorial_outputs/0 inputs=ppi_tutorial.yaml ckpt_path=/path/to/your/checkpoint/files/rfd3_latest.ckpt
|
||||
```
|
||||
Your output files will be placed in a new directory `ppi_tutorial_outputs/0`. If you run the tutorial again, change the `0` to another number to not overwrite your outputs. Your output files will be named `ppi_tutorial_insulinr_0_model_n.cif.gz` where `n` is the number of the design. `ppi_tutorial` comes from the name of the YAML file and `insulinr` comes from the name you gave your calculation in the YAML file.
|
||||
|
||||
```{note}
|
||||
You may see several warning messages when you run RFD3, these should not interfere with the calculation.
|
||||
```
|
||||
|
||||
(analyzing-the-outputs)=
|
||||
## Analyzing the Outputs
|
||||
You should end up with 8 designs, numbered 0-7, each with its own `.cif.gz` and `.json` file. If you want to adjust the number, add the configuration option `diffusion_batch_size` to your `rfd3 design` command.
|
||||
|
||||
The JSON file has many details about your diffusion run, including the options in the YAML file you created. The compressed CIF file that you can easily visualize with tools like PyMOL.
|
||||
|
||||
Your results should look something like this:
|
||||
```{figure} .assets/ppi_tutorial/example_output_w_hotspots.png
|
||||
:width: 60%
|
||||
|
||||
Green is the original input structure while blue is the designed binder. The hotspot residues are purple and represented as ball and sticks.
|
||||
```
|
||||
You'll notice that the binders are always on the side of the input structure closest to the hotspots.
|
||||
|
||||
The lengths of the designed binders are all also between 40 and 120 amino acids long. However, you'll also notice that they are all the same length!
|
||||
This is because RFD3 runs batched inference calculations. All of the calculations in a single 'batch' will have the same randomly sampled length, while designs from other batches will have different lengths. If you want to change the number of batches, you will want to add the setting `n_batches` to your `run rfd3` command.
|
||||
|
||||
(references-and-further-reading)=
|
||||
## References and Further Reading
|
||||
- For more information on the different inference settings in RFD3, see [input.md](input.md)
|
||||
- For more information on the example used here, see [*De novo design of protein structure and function with RFdiffusion*](https://www.nature.com/articles/s41586-023-06415-8#Sec12) by Joeseph L. Watson, et. al.
|
||||
- A more thorough discussion of the settings and configuration options in RFD3 can be found [here](intro_inference_calculations.md)
|
||||
|
||||
|
||||
|
||||
BIN
models/rfd3/docs/ppi_tutorial_files/basic.zip
Normal file
BIN
models/rfd3/docs/ppi_tutorial_files/fixed.zip
Normal file
14
models/rfd3/docs/ppi_tutorial_files/ppi_tutorial.yaml
Normal file
@@ -0,0 +1,14 @@
|
||||
insulinr:
|
||||
input: 4zxb_cropped.pdb
|
||||
contig: 40-120,/0,E6-155
|
||||
length: 190-270
|
||||
select_hotspots:
|
||||
E64: CD2,CZ
|
||||
E88: CG,CZ
|
||||
E96: CD1,CZ
|
||||
infer_ori_strategy: hotspots
|
||||
is_non_loopy: true
|
||||
select_fixed_atoms:
|
||||
E25: []
|
||||
E26: BKBN
|
||||
E27: CA,CB,OG
|
||||
1
models/rfd3/docs/readme.md
Symbolic link
@@ -0,0 +1 @@
|
||||
../README.md
|
||||
5
models/rfd3/docs/readmelink.md
Normal file
@@ -0,0 +1,5 @@
|
||||
README
|
||||
======
|
||||
|
||||
.. include:: ../README.md
|
||||
:parser: myst_parser.sphinx_
|
||||