RFdiffusion

mirror of https://github.com/RosettaCommons/RFdiffusion.git synced 2026-06-03 18:24:22 +08:00

Author	SHA1	Message	Date
Rachel Clune	2d0c003df4	Adding files for the soon-to-be-released RFdiffusion video tutorial (#452 ) The materials were created by Diego Lopez Mateos, Matthew Hvasta, and Kush Narang for the Tutorial Hackathon track of the 2026 Megathon event.	2026-04-24 11:47:30 -07:00
Rachel Clune	d1e7386992	Changed video_tutorials directory to just tutorials	2026-04-24 11:46:58 -07:00
Leonardo Marino-Ramirez	529b756796	feat: add inference.empty_cache_per_design flag to reduce CUDA allocator fragmentation (#451 ) ## Problem When running RFdiffusion with variable-length contigs (e.g. `contigmap.contigs=[A1-469/0 1-50]`) over hundreds or thousands of designs, per-worker VRAM grows steadily from ~7 GB to 10–13 GB per process. This limits how many workers can run in parallel on a single GPU before exhausting VRAM. Root cause: PyTorch's CUDA caching allocator accumulates fragmented memory blocks across designs. With variable-length contigs each design allocates differently-sized tensors; freed blocks are cached but cannot be reused for different-sized allocations, causing steady VRAM growth. ## Fix Add an optional `inference.empty_cache_per_design` flag (default `False`, opt-in) that calls `torch.cuda.empty_cache()` at the end of each design iteration. This releases all unused cached CUDA memory blocks back to the CUDA memory manager, keeping each worker near its initial VRAM footprint for the full run. ### Changes `config/inference/base.yaml` ```yaml write_trajectory: True empty_cache_per_design: False # NEW ``` `scripts/run_inference.py` — after the trajectory/PDB write block, before `log.info`: ```python if conf.inference.empty_cache_per_design and torch.cuda.is_available(): torch.cuda.empty_cache() log.info(f"Finished design in {(time.time()-start_time)/60:.2f} minutes") ``` ## Measured impact Tested on NVIDIA RTX 5090 32 GB running a long PPI campaign with variable-length contigs: \| Setting \| Per-worker VRAM (steady-state) \| \|---------\|-------------------------------\| \| Without fix \| 8–13 GB (grows over run) \| \| With `empty_cache_per_design=True` \| ~5.2 GB (stable) \| This allowed raising the number of parallel workers from 3 to 5 on a 32 GB GPU. ## Why opt-in `torch.cuda.empty_cache()` adds a small per-design overhead (~1–2 ms) and is only beneficial for long runs with variable-length contigs. For short runs or fixed-length designs there is no fragmentation issue, so the default remains `False` to preserve existing behavior. ## Testing All 20 applicable tests in `tests/test_diffusion.py` pass with this change. The one skipped test (`design_ppi_scaffolded`) fails due to a missing `ppi_scaffolds/` directory in the test fixture — a pre-existing issue unrelated to this PR. ## Notes - Placement is after both the PDB write (`writepdb`) and the optional trajectory block — every consumer of `denoised_xyz_stack` / `px0_xyz_stack` has already finished before the cache is cleared. - This does not affect memory held by live tensors — only frees cached-but-unused blocks. - Compatible with all existing RFdiffusion design modes (PPI, motif scaffolding, unconditional).	2026-04-24 10:41:07 -06:00
Rachel Clune	92f3c4ca27	Updated README Added details about: - the origins of the example used in the tutorial - how the input file was generated - how to install STRIDE	2026-04-23 14:39:47 -07:00
Rachel Clune	2f2a301575	Update README.md	2026-04-22 12:14:14 -07:00
Rachel Clune	122a2157c1	Adding files for the soon-to-be-released RFdiffusion video tutorial The materials were created by Diego Lopez Mateos, Matthew Hvasta, and Kush Narang for the Tutorial Hackathon track of the 2026 Megathon event.	2026-04-22 11:30:54 -07:00
Hope Woods	9535f19382	Updated "Fixed issues with designing in scaffoldguided mode" original PR 386 (#426 ) The original PR for this was #386 from [OrangeCatzhang](https://github.com/OrangeCatzhang). This PR is to fix the error "AttributeError: 'bool' object has no attribute 'scaffold_list'" when running in scaffoldguided mode. The first error is fixed by passing the full composed config object (conf) into BlockAdjacency instead of passing the scaffoldguided sub-node. BlockAdjacency expects the full config and to access conf.scaffoldguided.<fields> internally, so passing the sub-node caused self.conf.scaffoldguided to resolve to the nested boolean field (scaffoldguided.scaffoldguided), which produced the AttributeError when code tried to read .scaffold_list. Passing teh full conf fixes that mismatch. The other fix is to add initialization of cyclic_reses to ScaffoldedSampler. I have slightly updated what was in the original PR to avoid code duplication. I added a helper function to the Sampler class and then call that in both Sampler and ScaffoldedSampler to initialize cyclic_reses. I also removed the changes to the scaffoldedguided flag from the original PR, so the CLI stays the same.	2025-11-20 10:02:33 -06:00
woodsh17	ecf161b4e2	Move cyclic_reses initialization to a helper function and call it for Sampler and ScaffoldedSampler	2025-11-18 15:43:09 -06:00
woodsh17	723a66408c	Reverting changes to flag name, so you still use scaffoldguided.scaffoldguided=True instead of scaffoldguided_enabled	2025-11-18 12:26:26 -06:00
woodsh17	1ba79929d3	Merge remote-tracking branch 'origin/main' into fix_scaffoldguided_design	2025-11-18 10:46:28 -06:00
Hope Woods	ff20fbafef	Fix workflow tests that are failing (#410 ) This PR updates the tests so that all the examples run and if an example fails then the test results in a failure as well. Changes include: - Reformatting design_macrocyclic_binder.sh and design_macrocyclic_monomer.sh to be submitted correctly by test_diffusion.py - Reducing the total length in design_tetrahedral_oligos.sh to reduce run time of this test - Changes to test_diffusion.py and main.yml to be able to run the examples in different chunks so examples can run in parallel and to make sure that if an example errors out, that the tests does not pass. Currently design_ppi_scaffolded, design_timbarrel, and design_ppi_flexible_peptide_with_secondarystructure_specification are failing which should be addressed in other, future PRs.	2025-11-13 15:33:05 -06:00
woodsh17	dd7643d640	Revert "Remove try block with except FileExistsError that isn't needed" After running the test, I found that this was needed This reverts commit `ead721f326`.	2025-11-12 13:45:07 -06:00
woodsh17	ead721f326	Remove try block with except FileExistsError that isn't needed	2025-10-24 09:48:03 -05:00
woodsh17	4aea4fd65a	Change design_nickel.sh num_designs back to 15	2025-10-23 12:26:00 -05:00
woodsh17	681874e6ea	Reduce length for design_tetrahedral_oligos to 600 to reduce run time	2025-10-22 15:16:41 -05:00
woodsh17	be5b7b6b89	Preseed DGL so avoid examples that are ran at the same time failing	2025-10-22 15:13:35 -05:00
woodsh17	4d870c07da	In setUpClass check that directory doesn't already exists, so tests do not fail before running examples	2025-10-22 10:03:07 -05:00
woodsh17	dc1867dadf	Edit main.yml so workflow runs	2025-10-21 16:29:06 -05:00
woodsh17	56feb3bb4a	Remove set pipefail	2025-10-21 10:09:12 -05:00
woodsh17	4dd3a5f8c0	Remove -o option from pipefail	2025-10-21 10:02:56 -05:00
woodsh17	a570b2b8bd	Edit main.yml workflow so if an example fails, the test doesn't pass	2025-10-21 09:57:15 -05:00
woodsh17	260692d6c2	Edits to test_diffusion to make change setUp to setUpClass to make this a class method so examples run only once	2025-10-20 08:48:02 -05:00
woodsh17	17f315b2de	Add index of chunk to test directory name	2025-10-17 12:45:28 -05:00
woodsh17	a629d43e92	Add step to cd into tests directory in main.yml	2025-10-17 11:04:32 -05:00
woodsh17	6d0036b480	Fix typo in main.yml in test_diffusion options	2025-10-17 10:57:53 -05:00
woodsh17	e2378c7358	Change num_designs in examples back to 10, since this is reset in test_diffusion.py	2025-10-17 10:51:44 -05:00
woodsh17	5dff3eb188	These changes split the examples up in chunks and runs the different chunks to speed up the workflow.	2025-10-17 10:39:26 -05:00
woodsh17	a9b748be37	Add print statement to list failed examples	2025-10-14 11:44:50 -05:00
woodsh17	778f8d4b28	Fix input_pdb path in design_macrocyclic_binder and fix incorrect call to res in test_diffusion.py	2025-10-10 15:21:04 -05:00
woodsh17	f43a81d59d	Changing number of designs for examples from 10 to 2 to speed up tests	2025-10-10 13:53:59 -05:00
woodsh17	a45c3f9da7	Fail tests when example scripts exit non-zero	2025-10-10 13:41:05 -05:00
woodsh17	2d157302d3	Adjusting command format for design_macrocyclic_binder.sh and design_macrocyclic_monomer.sh	2025-10-10 09:36:45 -05:00
woodsh17	a18f99614a	Address 'command' referenced before assignment error in tests	2025-10-10 09:35:57 -05:00
Hope Woods	e220924202	Retain chain and residue numbering in RFdiffusion (#348 ) A number of issues (e.g., #103 , #171 , https://github.com/RosettaCommons/RFdiffusion/issues/312 , #315 ) have mentioned that RFdiffusion will change the chain IDs and residue numbering of the input structure. The designed chain ends up as chain "A", and the fixed chain(s) end up as chain "B". The numbering is also reset to start at 1. This can be particularly problematic in cases where comparisons to structures are needed, as well as multi-chain situations where all of the chains get fused. Inspired by @GCS-ZHN 's comment and solution referenced in Issue #103 , I've modified the code to maintain chain and residue numbering. In particular: Chains that are not "designable" will retain their original chain ID letters and residue numbers. Chains that are partially fixed (e.g., motif re-scaffolding) will retain their original chain ID letters. Residues will be re-numbered from 1 to length of chain. (It was not clear to me what the "correct" behaviour of chain residue numbering should be, given that the length of the chain and the position of any fixed residues might change.) Chains that are being fully generated de novo will be assigned the first available chain ID in the alphabet not used by any other chain. Residues will be numbered from 1 to length of chain.	2025-09-24 09:28:03 -05:00
Hope Woods	735de5edb1	Update README.md (#365 ) Update README to point to new [documentation resource](https://sites.google.com/omsf.io/rfdiffusion/overview). I also added some text in the Installation section to 1) specify that Sergey O.'s colab notebook only contains some of the features of RFdiffusion and 2) that there is now a Rosetta Commons-maintained docker image.	2025-09-23 11:38:58 -05:00
Orange	7c30fee0ab	Fixed issues with designing in scaffoldguided mode, for example: design_ppi_scaffold.sh. The solutions to issues 272 and 273 did not fully address the issue.	2025-08-11 09:24:52 +00:00
Rachel Clune	25b908256a	Update README.md	2025-07-16 09:37:32 -07:00
Rachel Clune	35c3af8a0a	Update README.md	2025-07-14 15:58:06 -07:00
Sergey Lyskov	fa340147b9	add GitHub CI workflow for running ppi-scaffolds tests (#1 )	2025-06-23 12:49:31 -06:00
David Juergens	b254828f07	Update README.md	2025-06-22 17:56:19 -07:00
David Juergens	b1ba556871	add .sh examples, gabarap pdb, import numpy	2025-06-22 17:56:19 -07:00
David Juergens	cd64c5ae72	bugfix readme	2025-06-22 17:56:19 -07:00
David Juergens	37e1d4da61	add rfpeptides img	2025-06-22 17:56:19 -07:00
David Juergens	ae036e30c6	Add RFpeptides code	2025-06-22 17:56:19 -07:00
Brahm Yachnin	d32205a17f	Extend available new chains to include lowercase letters Also print a warning if the user exceeds 52 chains (using up all upper- and lower-case chain ids).	2025-06-17 13:24:16 -04:00
Brahm Yachnin	63e270f715	For fixed chains, retain residue numbering For chains that are completely fixed, retain the residue numbering from the input rather than renumbering. For chains that are partially or fully designed by RFdiffusion, it isn't clear to me what the 'correct' behaviour should be, so these chains will be re-numbered starting at residue 1.	2025-05-20 10:38:19 -04:00
Brahm Yachnin	909fc01c03	Maintains the input chain ids in RFdiffusion output The output was previously renumbering all of the chains, making comparisons to the input structures and handling of multi-chain inputs challenging. This commit maintains the input chain ids in the output.	2025-05-16 10:31:01 -04:00
Joseph Watson	b44206a2a7	correct example	2024-08-26 10:35:40 -07:00
Joseph Watson/Watchwell	31f8091be2	Update README.md	2024-08-26 10:35:40 -07:00
Joseph Watson	f6ac51ee84	Added example	2024-08-26 10:35:40 -07:00

1 2 3

104 Commits