* refactor: delete files
* fix: add back files that included imported functions in chiral code
* chore: add back files to archive, update pyproject.yaml and gitignore
* Initial commit of chiral changes
Initial checkin of chiral feature code
Add chiral metric
* Update the way chiral features are incorporated into the model
Move initialization to new func
use default pytorch reset parameters
fix initialization for chirals
config
rename argument of confidence head
fix initialization for chirals
* refactor: src nest, rename rf2aa to modelhub
* refactor: initial commit without projects
* Initial commit of chiral changes
* Initial checkin of chiral feature code
* Add chiral metric
* Remove option for double residual connection. Add kq_norm oiptions to base (20250125) config.
* Restoring flag
* config
* rename argument of confidence head
* Update the way chiral features are incorporated into the model
* config
* rename argument of confidence head
* Update the way chiral features are incorporated into the model
* Initial commit of chiral changes
Initial checkin of chiral feature code
Add chiral metric
* Update the way chiral features are incorporated into the model
Move initialization to new func
use default pytorch reset parameters
fix initialization for chirals
config
rename argument of confidence head
fix initialization for chirals
* refactor: new modelhub
---------
Co-authored-by: fdimaio <dimaio@uw.edu>
Co-authored-by: HaotianZhangAI4Science <haotianzhang@zju.edu.cn>
changes
dry_run.py which runs through examples to check for loader errors
fixed conflcits
changed dataloaders to use assembly with num_chain=1
fixed fape and lddt
assembly loader for trainer
conflcit
merge conflcits
added covalent capabilities to the dataloader
added in modified residues to training
tweaked sm compl assembly cropping to be more permissive with keeping protein stubs
bug fix to reindex_protein_features_after_atomize, added FD3 phase 3 curation code, training now starts on digs w/o errors
training tweaks
limit number of ligands & protein chains
added additional curation scripts
added column to sm compl dataset csvs with LIGATOMS_RESOLVED
added code to get_train_valid_set to filter out partially resolved ligands
added a boolean flag `diffusion_training` to get_train_valid_set to use different loading logic and datapickles in diffusion training
refactored 'find_residues_to_atomize_covale'
fixed bug with truncating lig/prot partners at a set value, needs to be done before adding atomized residues
removed DATAPKL_DIFFUSION from params variable in data_loader.py because this is being handled upstream in diffusion training scripts
fixed bug in find_residues_to_atomize_covale
minor changes
fixed cropping of partially masked or partially cropped ligands
added argument `use_partial_ligands` to `crop_sm_compl_assembly` to control whether partially masked ligands are included in crop
added p_modres
added in p_atomize_modres into dstilled and validation sets
changed numpy arrays to tensors in get_assembly_msas to avoid type errors
moved p_atomize_modres into data loader params / arguments.py, apply it only in loader_sm_compl_assembly
limit total templates to MAXTPLT after combining templates across different protein chains
minor bug fix in loader_rna
allatom lddt calculation can be "striped" to save GPU memory
bug fix to striped lddt calculation
bug fix lddt
only compute lj between close residues to save GPU mem
bug fix atomize residue
moved try up
bug fix
added checkpointing on call to structure refinement layer
added MAXMASKEDLIGATOMS parameter and code to remove masked ligand atoms if they exceed a certain number within a given ligand
modified assembly croppping to only choose query atoms that are not masked
added new multi-MSA loading method
fixed small bug with removing masked atoms from ligands
minor refactor to handling of chiral index offsets in featurize_single_chain
bug fixes
bug fixes
training tweaks
bug fix
expand_multi_msa bug fix
tweak calc_BB_bond_geom to work with all-atom codebase
minor bug fix
bug fix
training changes
added renamed load_multi_msa() to load_minimal_multi_msa() and added load_multi_msa() to take some code out of loader_sm_compl_assembly()
recurated data to get rid of some buggy examples
tweaked .gitignore
training tweaks
consolidated sm_compl csvs into same file, labeled different datasets in the SUBSET column, updated get_train_valid_sets() to parse this
tweak to get_train_valid_set
Adds negative datasets to data_loader.py
Also some .gitignore improvements, __init__.py files for importing from
non-relative paths.
Added Binder Network to RF model
Added binder loss calculation, weighting
Added dataloading of negatives, new main
The new main script can submit things via submitit, and makes
it slightly more convenient to submit running jobs over multiple
gpus and nodes.
Bug fixes
Fixes to same chain calculations, cropping, arguments, and realignmenT
added smiles loading
Bug fixes, dude validation, intra chain FAPE fixes
minor fixes
changed multi-MSA pairing algorithm.
instead of adding back "filler" sequences, unpaired sequences are stacked on the bottom of the MSA during pairing. the "overlap" region between the 2 input MSAs is kept track of, so that this region is included in any unpaired sequences. this method retains more pairing information than the previous method.
better comments for multi-MSA stuff
training fixes
bug fix
tweaks
tweak
bug fixes to multi MSAs
fixed frame bug
tweak DistributedWeightedSampler to correct rounding error on num examples in last dataset, which can lead to DDP hangs at end of epoch
minor bug fix
minor tweak
bug fix to sampling negative ligand examples
updated datasets to exclude covalent bonds to H
updated curation to remove covalent bonds to H's
curation tweaks
curation tweaks
removed all bad examples at curation time, got rid of filters in get_train_valid_set
minor bug fix after merge