104 Commits

Author SHA1 Message Date
Rachel Clune
2d0c003df4 Adding files for the soon-to-be-released RFdiffusion video tutorial (#452)
The materials were created by Diego Lopez Mateos, Matthew Hvasta, and
Kush Narang for the Tutorial Hackathon track of the 2026 Megathon event.
2026-04-24 11:47:30 -07:00
Rachel Clune
d1e7386992 Changed video_tutorials directory to just tutorials 2026-04-24 11:46:58 -07:00
Leonardo Marino-Ramirez
529b756796 feat: add inference.empty_cache_per_design flag to reduce CUDA allocator fragmentation (#451)
## Problem

When running RFdiffusion with variable-length contigs (e.g.
`contigmap.contigs=[A1-469/0 1-50]`) over hundreds or thousands of
designs, per-worker VRAM grows steadily from ~7 GB to 10–13 GB per
process. This limits how many workers can run in parallel on a single
GPU before exhausting VRAM.

Root cause: PyTorch's CUDA caching allocator accumulates fragmented
memory blocks across designs. With variable-length contigs each design
allocates differently-sized tensors; freed blocks are cached but cannot
be reused for different-sized allocations, causing steady VRAM growth.

## Fix

Add an optional `inference.empty_cache_per_design` flag (default
`False`, opt-in) that calls `torch.cuda.empty_cache()` at the end of
each design iteration. This releases all unused cached CUDA memory
blocks back to the CUDA memory manager, keeping each worker near its
initial VRAM footprint for the full run.

### Changes

**`config/inference/base.yaml`**
```yaml
  write_trajectory: True
  empty_cache_per_design: False   # NEW
```

**`scripts/run_inference.py`** — after the trajectory/PDB write block,
before `log.info`:
```python
        if conf.inference.empty_cache_per_design and torch.cuda.is_available():
            torch.cuda.empty_cache()

        log.info(f"Finished design in {(time.time()-start_time)/60:.2f} minutes")
```

## Measured impact

Tested on NVIDIA RTX 5090 32 GB running a long PPI campaign with
variable-length contigs:

| Setting | Per-worker VRAM (steady-state) |
|---------|-------------------------------|
| Without fix | 8–13 GB (grows over run) |
| With `empty_cache_per_design=True` | ~5.2 GB (stable) |

This allowed raising the number of parallel workers from 3 to 5 on a 32
GB GPU.

## Why opt-in

`torch.cuda.empty_cache()` adds a small per-design overhead (~1–2 ms)
and is only beneficial for long runs with variable-length contigs. For
short runs or fixed-length designs there is no fragmentation issue, so
the default remains `False` to preserve existing behavior.

## Testing

All 20 applicable tests in `tests/test_diffusion.py` pass with this
change. The one skipped test (`design_ppi_scaffolded`) fails due to a
missing `ppi_scaffolds/` directory in the test fixture — a pre-existing
issue unrelated to this PR.

## Notes

- Placement is after both the PDB write (`writepdb`) and the optional
trajectory block — every consumer of `denoised_xyz_stack` /
`px0_xyz_stack` has already finished before the cache is cleared.
- This does not affect memory held by live tensors — only frees
cached-but-unused blocks.
- Compatible with all existing RFdiffusion design modes (PPI, motif
scaffolding, unconditional).
2026-04-24 10:41:07 -06:00
Rachel Clune
92f3c4ca27 Updated README
Added details about:
- the origins of the example used in the tutorial
- how the input file was generated
- how to install STRIDE
2026-04-23 14:39:47 -07:00
Rachel Clune
2f2a301575 Update README.md 2026-04-22 12:14:14 -07:00
Rachel Clune
122a2157c1 Adding files for the soon-to-be-released RFdiffusion video tutorial
The materials were created by Diego Lopez Mateos, Matthew Hvasta, and Kush Narang for the Tutorial Hackathon track of the 2026 Megathon event.
2026-04-22 11:30:54 -07:00
Hope Woods
9535f19382 Updated "Fixed issues with designing in scaffoldguided mode" original PR 386 (#426)
The original PR for this was #386 from
[OrangeCatzhang](https://github.com/OrangeCatzhang). This PR is to fix
the error "AttributeError: 'bool' object has no attribute
'scaffold_list'" when running in scaffoldguided mode.

The first error is fixed by passing the full composed config object
(conf) into BlockAdjacency instead of passing the scaffoldguided
sub-node. BlockAdjacency expects the full config and to access
conf.scaffoldguided.<fields> internally, so passing the sub-node caused
self.conf.scaffoldguided to resolve to the nested boolean field
(scaffoldguided.scaffoldguided), which produced the AttributeError when
code tried to read .scaffold_list. Passing teh full conf fixes that
mismatch.

The other fix is to add initialization of cyclic_reses to
ScaffoldedSampler. I have slightly updated what was in the original PR
to avoid code duplication. I added a helper function to the Sampler
class and then call that in both Sampler and ScaffoldedSampler to
initialize cyclic_reses. I also removed the changes to the
scaffoldedguided flag from the original PR, so the CLI stays the same.
2025-11-20 10:02:33 -06:00
woodsh17
ecf161b4e2 Move cyclic_reses initialization to a helper function and call it for Sampler and ScaffoldedSampler 2025-11-18 15:43:09 -06:00
woodsh17
723a66408c Reverting changes to flag name, so you still use scaffoldguided.scaffoldguided=True instead of scaffoldguided_enabled 2025-11-18 12:26:26 -06:00
woodsh17
1ba79929d3 Merge remote-tracking branch 'origin/main' into fix_scaffoldguided_design 2025-11-18 10:46:28 -06:00
Hope Woods
ff20fbafef Fix workflow tests that are failing (#410)
This PR updates the tests so that all the examples run and if an example
fails then the test results in a failure as well. Changes include:

- Reformatting design_macrocyclic_binder.sh and
design_macrocyclic_monomer.sh to be submitted correctly by
test_diffusion.py
- Reducing the total length in design_tetrahedral_oligos.sh to reduce
run time of this test
- Changes to test_diffusion.py and main.yml to be able to run the
examples in different chunks so examples can run in parallel and to make
sure that if an example errors out, that the tests does not pass.

Currently design_ppi_scaffolded, design_timbarrel, and
design_ppi_flexible_peptide_with_secondarystructure_specification are
failing which should be addressed in other, future PRs.
2025-11-13 15:33:05 -06:00
woodsh17
dd7643d640 Revert "Remove try block with except FileExistsError that isn't needed"
After running the test, I found that this was needed
This reverts commit ead721f326.
2025-11-12 13:45:07 -06:00
woodsh17
ead721f326 Remove try block with except FileExistsError that isn't needed 2025-10-24 09:48:03 -05:00
woodsh17
4aea4fd65a Change design_nickel.sh num_designs back to 15 2025-10-23 12:26:00 -05:00
woodsh17
681874e6ea Reduce length for design_tetrahedral_oligos to 600 to reduce run time 2025-10-22 15:16:41 -05:00
woodsh17
be5b7b6b89 Preseed DGL so avoid examples that are ran at the same time failing 2025-10-22 15:13:35 -05:00
woodsh17
4d870c07da In setUpClass check that directory doesn't already exists, so tests do not fail before running examples 2025-10-22 10:03:07 -05:00
woodsh17
dc1867dadf Edit main.yml so workflow runs 2025-10-21 16:29:06 -05:00
woodsh17
56feb3bb4a Remove set pipefail 2025-10-21 10:09:12 -05:00
woodsh17
4dd3a5f8c0 Remove -o option from pipefail 2025-10-21 10:02:56 -05:00
woodsh17
a570b2b8bd Edit main.yml workflow so if an example fails, the test doesn't pass 2025-10-21 09:57:15 -05:00
woodsh17
260692d6c2 Edits to test_diffusion to make change setUp to setUpClass to make this a class method so examples run only once 2025-10-20 08:48:02 -05:00
woodsh17
17f315b2de Add index of chunk to test directory name 2025-10-17 12:45:28 -05:00
woodsh17
a629d43e92 Add step to cd into tests directory in main.yml 2025-10-17 11:04:32 -05:00
woodsh17
6d0036b480 Fix typo in main.yml in test_diffusion options 2025-10-17 10:57:53 -05:00
woodsh17
e2378c7358 Change num_designs in examples back to 10, since this is reset in test_diffusion.py 2025-10-17 10:51:44 -05:00
woodsh17
5dff3eb188 These changes split the examples up in chunks and runs the different chunks to speed up the workflow. 2025-10-17 10:39:26 -05:00
woodsh17
a9b748be37 Add print statement to list failed examples 2025-10-14 11:44:50 -05:00
woodsh17
778f8d4b28 Fix input_pdb path in design_macrocyclic_binder and fix incorrect call to res in test_diffusion.py 2025-10-10 15:21:04 -05:00
woodsh17
f43a81d59d Changing number of designs for examples from 10 to 2 to speed up tests 2025-10-10 13:53:59 -05:00
woodsh17
a45c3f9da7 Fail tests when example scripts exit non-zero 2025-10-10 13:41:05 -05:00
woodsh17
2d157302d3 Adjusting command format for design_macrocyclic_binder.sh and design_macrocyclic_monomer.sh 2025-10-10 09:36:45 -05:00
woodsh17
a18f99614a Address 'command' referenced before assignment error in tests 2025-10-10 09:35:57 -05:00
Hope Woods
e220924202 Retain chain and residue numbering in RFdiffusion (#348)
A number of issues (e.g., #103 , #171 ,
https://github.com/RosettaCommons/RFdiffusion/issues/312 , #315 ) have
mentioned that RFdiffusion will change the chain IDs and residue
numbering of the input structure. The designed chain ends up as chain
"A", and the fixed chain(s) end up as chain "B". The numbering is also
reset to start at 1. This can be particularly problematic in cases where
comparisons to structures are needed, as well as multi-chain situations
where all of the chains get fused.

Inspired by @GCS-ZHN 's comment and solution referenced in Issue #103 ,
I've modified the code to maintain chain and residue numbering. In
particular:

Chains that are not "designable" will retain their original chain ID
letters and residue numbers.
Chains that are partially fixed (e.g., motif re-scaffolding) will retain
their original chain ID letters. Residues will be re-numbered from 1 to
length of chain. (It was not clear to me what the "correct" behaviour of
chain residue numbering should be, given that the length of the chain
and the position of any fixed residues might change.)
Chains that are being fully generated de novo will be assigned the first
available chain ID in the alphabet not used by any other chain. Residues
will be numbered from 1 to length of chain.
2025-09-24 09:28:03 -05:00
Hope Woods
735de5edb1 Update README.md (#365)
Update README to point to new [documentation
resource](https://sites.google.com/omsf.io/rfdiffusion/overview).

I also added some text in the Installation section to 1) specify that
Sergey O.'s colab notebook only contains some of the features of
RFdiffusion and 2) that there is now a Rosetta Commons-maintained docker
image.
2025-09-23 11:38:58 -05:00
Orange
7c30fee0ab Fixed issues with designing in scaffoldguided mode, for example: design_ppi_scaffold.sh. The solutions to issues 272 and 273 did not fully address the issue. 2025-08-11 09:24:52 +00:00
Rachel Clune
25b908256a Update README.md 2025-07-16 09:37:32 -07:00
Rachel Clune
35c3af8a0a Update README.md 2025-07-14 15:58:06 -07:00
Sergey Lyskov
fa340147b9 add GitHub CI workflow for running ppi-scaffolds tests (#1) 2025-06-23 12:49:31 -06:00
David Juergens
b254828f07 Update README.md 2025-06-22 17:56:19 -07:00
David Juergens
b1ba556871 add .sh examples, gabarap pdb, import numpy 2025-06-22 17:56:19 -07:00
David Juergens
cd64c5ae72 bugfix readme 2025-06-22 17:56:19 -07:00
David Juergens
37e1d4da61 add rfpeptides img 2025-06-22 17:56:19 -07:00
David Juergens
ae036e30c6 Add RFpeptides code 2025-06-22 17:56:19 -07:00
Brahm Yachnin
d32205a17f Extend available new chains to include lowercase letters
Also print a warning if the user exceeds 52 chains (using up all upper-
and lower-case chain ids).
2025-06-17 13:24:16 -04:00
Brahm Yachnin
63e270f715 For fixed chains, retain residue numbering
For chains that are completely fixed, retain the residue numbering from
the input rather than renumbering.  For chains that are partially or
fully designed by RFdiffusion, it isn't clear to me what the 'correct'
behaviour should be, so these chains will be re-numbered starting at
residue 1.
2025-05-20 10:38:19 -04:00
Brahm Yachnin
909fc01c03 Maintains the input chain ids in RFdiffusion output
The output was previously renumbering all of the chains, making
comparisons to the input structures and handling of multi-chain inputs
challenging.  This commit maintains the input chain ids in the
output.
2025-05-16 10:31:01 -04:00
Joseph Watson
b44206a2a7 correct example 2024-08-26 10:35:40 -07:00
Joseph Watson/Watchwell
31f8091be2 Update README.md 2024-08-26 10:35:40 -07:00
Joseph Watson
f6ac51ee84 Added example 2024-08-26 10:35:40 -07:00