* documentation for release draft start * trajectory.png * update readme to rf3-lab paths, annotate TODOs * add input_pdbs, demo.json * Update README.md example pngs * tasks pngs * Update README.md - restructure pngs and application links * Update README.md mc * Update README.md add ipynb kernel export instruction * mpnn all.ipynb * open and edit tutorial.zip * Update run_inf_tutorial.sh * remove outputs * cleanup * rename * soft code hbplus executable * rename modelforge to foundry (rfd3) README * fix: enabled running rfd3, mpnn inline * cleanup * remove todos, one remaining * clear outputs --------- Co-authored-by: Raktim Mitra <raktim@digs> Co-authored-by: Raktim Mitra <raktim@localhost> Co-authored-by: Rohith Krishna <rohith@localhost> Co-authored-by: Raktim Mitra <raktim@digs.ipd.uw.edu>
8.7 KiB
RFdiffusion3 — Input specification (dialect 2)
TL;DR
Inputs are now defined with a singleInputSpecificationclass.
Selections like “what’s fixed?”, “what’s sequence-free?”, “which atoms are donors/acceptors?” are all expressed with the same InputSelection mini-language.
Everything is reproducibly logged back out alongside your generation.
- What changed (high level)
- Quick start
- The
InputSelectionmini-language - Full schema:
InputSpecification - Common recipes (cookbook)
- Partial diffusion
- Symmetry
- Origin (
ori_token) and initialization - Validation & error messages
- Metadata & logging
- Legacy configs (dialect=1) & migration guide
- Multi-example files
- FAQ / gotchas
What changed (high level)
- Unified selections. All per-residue/atom choices now use InputSelection:
- You can pass
true/false, a contig string ("A1-10,B5-8"), or a dictionary ({"A1-10": "ALL", "B5": "N,CA,C,O"}). - New selection fields include:
select_fixed_atoms,select_unfixed_sequence,select_buried,select_partially_buried,select_exposed,select_hbond_donor,select_hbond_acceptor,select_hotspots.
- You can pass
- Clearer unindexing. For unindexed motifs you typically either fix
"ALL"atoms or explicitly choose subsets such as"TIP"/"BKBN"/explicit atom lists via a dictionary (see examples).
When usingunindex, only the atoms you mark as fixed are carried over from the input. - Reproducibility. The exact specification and the sampled contig are logged back into the output JSON. We also log useful counts (atoms, residues, chains).
- Safer parsing. You’ll now get early, informative errors if:
- You pass unknown keys,
- A selection doesn’t match any atoms,
- Indexed and unindexed motifs overlap,
- Mutually exclusive selections overlap (e.g., two RASA bins for the same atom).
- Backwards compatible. Add
"dialect": 1to keep your old configs running while you migrate. (Deprecated.)
InputSpecification
| Field | Type | Description |
|---|---|---|
input |
str? |
Path to input PDB/CIF. Required if you provide contig+length. |
atom_array_input |
internal | Pre-loaded AtomArray (not recommended). |
contig |
InputSelection? |
Indexed motif specification, e.g., "A1-80,10,\0,B5-12". |
unindex |
InputSelection? |
Unindexed motif components (unknown sequence placement). |
length |
str? |
Total design length constraint; "min-max" or int. |
ligand |
str? |
Ligand(s) by resname or index. |
cif_parser_args |
dict? |
Optional args to CIF loader. |
extra |
dict |
Extra metadata (e.g., logs). |
dialect |
int |
2=new (default), 1=legacy. |
select_fixed_atoms |
InputSelection? |
Atoms with fixed coordinates. |
select_unfixed_sequence |
InputSelection? |
Where sequence can change. |
select_buried / select_partially_buried / select_exposed |
InputSelection? |
RASA bins 0/1/2 (mutually exclusive). |
select_hbond_donor / select_hbond_acceptor |
InputSelection? |
Atom-wise donor/acceptor flags. |
select_hotspots |
InputSelection? |
Atom-level or token-level hotspots. |
redesign_motif_sidechains |
bool |
Fixed backbone, redesigned sidechains for motifs. |
symmetry |
SymmetryConfig? |
See docs/symmetry.md. |
ori_token |
list[float]? |
[x,y,z] origin override to control COM placement |
infer_ori_strategy |
str? |
"com" or "hotspots". |
plddt_enhanced |
bool |
Default true. |
is_non_loopy |
bool |
Default true. |
partial_t |
float? |
Noise (Å) for partial diffusion, enables partial diffusion |
Quick start
Minimal JSON example
{
"": {
"input": "path/to/template.pdb",
"contig": "A1-80",
"length": "150-180",
"select_fixed_atoms": true,
"select_unfixed_sequence": "A20-35",
"ligand": "HAX,OAA",
"dialect": 2
}
}
Mininmal YAML example
input: path/to/template.pdb
contig: A1-80
length: 150-180
select_fixed_atoms: true
select_unfixed_sequence: A20-35
ligand: HAX,OAA
dialect: 2
Python API
from projects.aa_design.inference.input_parsing import create_atom_array_from_design_specification
atom_array, metadata = create_atom_array_from_design_specification(
input="path/to/template.pdb",
contig="A1-80",
length="150-180",
select_fixed_atoms=True,
select_unfixed_sequence="A20-35",
dialect=2,
)
Demo examples
Enzyme
TODO
Symmetry
TODO.
See symmetry.md for more details
Partial diffusion
TODO
Binder design
TODO
The InputSelection mini-language
Fields which are specified as InputSelection are fields which can take either: Bool, List, Dict.
Dictionaries are the most expressive and can also take special :
select_fixed_atoms:
A1-2: BKBN
A3: N,CA,C,O,CB # specific atoms by atom name
B5-7: ALL # Selects all atoms within B5,B6 and B7
B10: TIP # selects common tipatom for residue (constants.py)
LIG: '' # selects no atoms (i.e. unfixes the atoms for ligands named `LIG`)
[Diagram]
Unindexing specifics
unindex marks motif tokens whose relative sequence placement is unknown to the model (useful for scaffolding around active sites, etc.).
Use a string to list the unindexed components and where breaks occur.
Use a dictionary if you want to fix specific atoms of those residues; atoms not fixed are not copied from the input (they will be diffused).
Breaks between unindexed components follow the contig conventions you’re used to. For example:
"A244,A274,A320,A329,A375"
lists multiple unindexed components; internal “breakpoints” are inferred and logged. (Offset syntax like A11-12 or A11,0,A12 still ties residues.)
Appendix
FAQ / gotchas
Q: Do I need select_fixed_atoms & select_unfixed_sequence every time? A: No. Defaults apply when input present.
Q: What does "ALL" vs "TIP" in unindex mean? A: "ALL" → copy full residue; "TIP" → fix only sidechain tip atoms.
Q: Can selections overlap? A: Only certain ones (fixed vs unfixed) may; RASA & donor/acceptor cannot.
Q: How to fix backbone but redesign sidechains? A: redesign_motif_sidechains: true.
Q: Why “Input provided but unused”? A: You gave input but no contig, unindex, or partial_t.
Shorthand atoms
Keyword Expands to BKBN N, CA, C, O TIP Residue-specific “tip” atoms ALL All atoms of each residue