* RGD code cleanup
- made an effort to give more meaningful names to variables (e.g., renamed most instances of attachment (point) to avoid ambiguity as attachment may be interpreted as either the R-group atom or its neighbor atom on the core, which are two different things)
- replaced the old school removeAtom() method with begin/commitBatchEdit()
- added std::move and std::make_move_iterator where relevant to avoid potential unintended copying
- replaced instances of container.size() == 0 and !container.size() with container.empty() for better clarity
- replaced std::map::find() with std::map::at() where the key was not needed
- replaced expensive std::find_if with more efficient alternative
- added some missing const keywords and added references to avoid copying where appropriate
- replaced for loops with modern implicit looping alternatives where convenient
- avoid calling MolToSmiles when VERBOSE is not defined as the result is anyway not used
- removed "oops, exponential is a pain" code snippet as I believe 1. it is never executed 2. it is not tested 3. I do not think it is correct
- removed check for data->matches.size() > 1 as I do not believe it is correct
- Use std::unique_ptr::reset instead of defining a new std::unique_ptr and moving it to the original one
* changes in response to review
* change in response to review
* replaced std::set with boost::dynamic_bitset to save time on std::set::insert and std::set::find
* make sure we do not go out of bounds
---------
Co-authored-by: ptosco <paolo.tosco@novartis.com>
* - extracted the core matching logic into a separate function
- added relevant C++ and Python tests
* Update Code/GraphMol/RGroupDecomposition/RGroupDecomp.cpp
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
---------
Co-authored-by: ptosco <paolo.tosco@novartis.com>
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* Support tautomer queries in RGD
* Continuing RGD and tautomer development
* Python and C# tests
* Python and C# tests
* C# test
* Typo fix
* For cire tautomer query update properties instead of full sanitization
* Added query comment
* Code review change
* Support Enumeration of input cores
* Mol enumeration test
* Remove useNormalMatch from RGroupDecomp
* Added comments for handling tautomeric core
* Added comments for handling tautomeric core
* Undo change to master
* Fixed typo in tests
* Undo change to master
* Initial development and test
* Sort of working tests
* Copy corodinates to new core
* Clear stereochemistry on core atoms with unlabelled rgroups
* Fixed typo in tests
* Undo change to master
* Fixed typo in tests
* Undo change to master
* Fixed typo in tests
* Undo change to master
* Continuing development
* Updated development
* Fixed Chirality Issues
* All tests working
* Remove some unused code
* Fixed typo in tests
* Undo change to master
* Fixed typo in tests
* Undo change to master
* Fixed typo in tests
* Undo change to master
* Working tests
* Tidy test code
* Adjust catch_rgd for stereochemistry in output cores
* Build ring info in output cores
* Fix Mac OS bug
* Fix for MCS and onlyMatchAtRGroups
* Brian's optimization suggestion
* Fix core group coordinate bug
* Test for replaceCore and multiple core bonds to chiral atom
* Fixed typo in tests
* Undo change to master
* Fixed typo in tests
* Undo change to master
* Fixed typo in tests
* Undo change to master
* Fixed typo in tests
* Undo change to master
* Fixed typo in tests
* Undo change to master
* Fixed typo in tests
* Undo change to master
* Update Code/GraphMol/RGroupDecomposition/RGroupDecompParams.cpp
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* Update Code/GraphMol/RGroupDecomposition/RGroupDecompParams.cpp
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* Changes in response to Greg's code review
* R group stereo bond attachment fix
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* Fixed typo in tests
* Undo change to master
* Undo change to master
* Recalculate core dummy positions for hydrogen only r groups
* Undo change to master
* Fixed typo in tests
* Undo change to master
* Greg's code review changes
* Fix for RGD dummy atom bug
* Also fix labelling issues in the R group containing input dummy atom
* minor tweaks to the proposed fix
Co-authored-by: greg landrum <greg.landrum@gmail.com>
* Later cores with more R-groups should only be chosen when they are structurally related, i.e. when they are superstructures of earlier cores
* Update Code/GraphMol/RGroupDecomposition/testRGroupDecomp.cpp
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* Update Code/GraphMol/RGroupDecomposition/testRGroupDecomp.cpp
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
Co-authored-by: Tosco, Paolo <paolo.tosco@novartis.com>
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* Support wildcard in input structures
* Fix typos
* Handle R groups containing a single wildcard and wildcards with group numbers
* Reorder tests
* Use propety instead of isotope to mark input dummy
* fix the windows DLL builds
* Added constexpr for dummy input atom property
* Omitted to replace one instance of string 'INPUT_DUMMY'
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* speed up scoring of permutations by clever caching of already settled
matches rather than recomputing scores for all matches every time
process() is called
* changes in response to review
* changes in response to review
Co-authored-by: Paolo Tosco <paolo.tosco@novartis.com>
* support substructure search parameters in RGD.
Still needs testing/verification of the enhanced stereo stuff
* test enhanced stereo
* add support to python wrapper
unfortunately some python reformatting got mixed in there.
* Most tests working
* All tests working
* Fixed tests after merge with master
* Create header and implementations for RCore
* Updated comments
* Removed old code
* DLL export for MolMatchFinalCheckFunctor
* Information line for failing Mac test
* Log replace core behaviour
* Ordering fix for OSX
* Possible fuzzer fix
* Removed debug output
* Fix unmatched user R group bug
* Code review changes
* Bug fix and ChemTransforms test
* - do not add unncessary R-labels
- use a boost::dynamic_bitset rather than a std::set for lookups
* - R group labels can be >0 or <0, not 0, so no need to check for >=0 when looking for user labels
- as soon as a core is found that requires no additional labels to accommodate a molecule, bail out from the loop as no better core can be found
- add a test to better describe the use case for this change
- remove a signed/unsigned warning
* - added an entry to Release Notes to describe the impact of #3969
* avoid French expressions in Release Notes
Co-authored-by: Paolo Tosco <paolo.tosco@novartis.com>
* 1) fixed residual kekulization issues when R-labels are removed from aromatic atoms on the core
2) added removeAllHydrogenRGroupsAndLabels flag, which defaults to True. If set to False,
unused user-defined labels on the core will be retained. If also removeAllHydrogenRGroups
is set to False, all-hydrogen user-defined R-groups will be included in the output.
Under no circumstance dynamically-added R-groups (when onlyMatchAtRGroups=false)
that are all-hydrogen will be included in the output, as that's not useful (change
compared to previous behavior)
3) added tests for (1), (2) and for removeAllHydrogenRGroups, that had no tests so far
4) fixed a few documentation typos in the Python code and added some documentation
to the C++ sources
* added missing test files
Co-authored-by: Paolo Tosco <paolo.tosco@novartis.com>
* make sure that unlabelled R-groups on aromatic nitrogens have correct valence and formal charge to avoid kekulization failures
* change in response to review
Co-authored-by: Paolo Tosco <paolo.tosco@novartis.com>
* backup
* simple first pass, passes all tests
* cleanup a bunch of existing uses
* ensure that we can safely add atoms/bonds while in edit mode
* add context manager on python side
* handle exceptions properly in those
* changes in response to review
* RGD modifications for any atom and index labels
* Continued development
* All tests working
* Added comment
* CR changes suggested by PTosco
* Fix catch_rgd for autocrlf
* Core dummy matches on output. RGroups on heavy atom. Dummy atoms User rgroups only when they are degree 1.
* Start work on test fixes
* testRGroupDecomp test working
* CPP and Python tests working
* Removed options for matching core query atoms on sidechains
* Windows build fix
* R groups off ring. User group matches single heavy substituent. Remove extraneous hydrogens
* Updated fingerprint variance score and tie selection
* Refactor fingerprint variance score functions to class
* Removed fingerprint distance score
* Boost::trim fix
* Updated RGD test notebook
* Fixed AddHs.cpp
* - fixes the kekulization issue
- avoids that empty R-group labels are included in cores
- makes sure that SMILES cores are always canonical
- adds a few missing const declarations and avoids unintentional copying
* Support for allowNonTerminalRGroups parameter. Remove R groups that contain H or Nothing. Ignore R group labels on non-dummy atoms
* Fixed tests for Paolo's changes. Rebuilt test notebook. Increased weighting of rgroup penalty in fingerprint variance score
* remove some debug output
Co-authored-by: Brian Kelley <fustigator@gmail.com>
Co-authored-by: greg landrum <greg.landrum@gmail.com>
* Exploration
* Initial work on GA fro Rgroup Symmetry
* GA for rgroup decomp and fingerprint rgroup symmetry scoring
* Continuing development
* Exploration
* Initial work on GA fro Rgroup Symmetry
* GA for rgroup decomp and fingerprint rgroup symmetry scoring
* Continuing development
* Further development
* Continued tweaks
* Function rename
* Continued tweaks
* Bug fix for variance calculation
* Copyright notices. Remove Eigen dependency. RdKit logging. Clock fix.
* Changes to fix build failures
* Fixes for Windows dynamic DLL build
* Included GA export.h file
* Fixed RGroupDecomp CMakeLists.txt
* Notebooks working, GGroup labelling bug fixed
* Fix windows build. More options for example GA program
* More bugs found and tests adjusted
* Fixed Python rgroup test
* Trivial change to trigger CI
* OSX java and windows build fixes
* Windows DLL fix
* Fix segmentation error
* proposed change
* Possible fix for segmentation fault
* CR fixes
* CR fixes
* CR fixes
* Recreates molecules from rgroups where possible
Co-authored-by: greg landrum <greg.landrum@gmail.com>
Co-authored-by: Brian Kelley <fustigator@gmail.com>
* - replaced set with vector for SMILES-based R-group equivalence
- the first GreedyChunk is constituted by chunkSize+1 mols
- labeled R-groups may not be extracted when onlyMatchAtRGroups==false
- labeled geminal R-groups are incorrectly scored
- my attempt to introduce consistency in R-group labeling was buggy
- added a DEBUG pre-processor directive to the tests to make debugging easier
- added a unit test
- fixed unit test results which were inconsistent with the expected behavior
* changes in response to review
* - Fixes three bugs in the R-group decomposition code
* - delete iterator properly during loop so the Mac does not complain
* added more tests
Co-authored-by: user173873 <user173873@FF026.local>
* refactor RGroupDecompositionParameters to directly initialize data members
add params getter for the RGroupDecomposition object
* initial implementation of timeout
* refactor the timeout check
We could move the function that checks and throws the exception somewhere else
* re-enable the tests I stupidly left disabled
* Fixes#3224
add option to skip symmetrization entirely
* test #3224
* stupid mistake
* run clang-tidy with modernize-use-default-member-init
* results from modernize-use-emplace
* one uniform initialization per line
otherwise SWIG is unhappy
Co-authored-by: Brian Kelley <fustigator@gmail.com>
* run clang-tidy with readability-braces-around-statements
clang-format the results
clean up all the parts that clang-tidy-8 broke
* fix problem on windows
* Better resolve ties in rgroup matches
* Break ties by choosing the permutation that adds the fewest rgroups
* Update aligned cores after tie breaking\nI think this results is more optimal
* Remove unused code
* Replace count with UsedLabels
* Remove whitespace
* Remove whitespace
* Remove redundant code and error message
* Revert test
* remove debug prints
* Fix core labelling and multi-core hydrogen removals
* Add comment that explains confusing bit about indexlabels for multicores
* Multi core fixes: hydrogens properly removed, fixed labelling
* clang-format
* update version of japanese docs
* Remove external labels from cores
* Fix syntax errors
* Add better autodetection of labels, add dummyatom label, don't fall back to indexes when onlyMatchAtRgroups are set
* Add better autodetection of labels, add dummyatom label, don't fall back to indexes when onlyMatchAtRgroups are set
* Move autodetection before alignment, fix final core labelling
* Fix stupid bit twiddling mistake
* None of the original mol's should actually match the cores with onlyMatchAtRgroups
* Convert PRECONDITION to CHECK_INVARIANT
* Run clang-format
* use nullptr instead of 0 for pointers
* Handle cases where molecules don't have anything for an R-group properly.
Here's the python demo of the bug:
```
In [14]: scaffold2 = Chem.MolFromSmiles('c1c([*:1])cncn1')
In [15]: scaffold = Chem.MolFromSmiles('c1c([*:1])cccn1')
In [19]: mols2 = [Chem.MolFromSmiles(smi) for smi in 'c1c(F)cc(O)cn1 c1c(F)cncn1 c1c(Cl)cc(O)cn1'.split()]
In [20]: print(rdRGroupDecomposition.RGroupDecompose([scaffold,scaffold2],mols2,asSmiles=True,asRows=False))
({'Core': ['c1ncc([*:2])cc1[*:1]', 'c1ncc([*:1])cn1', 'c1ncc([*:2])cc1[*:1]'], 'R1': ['F[*:1]', 'F[*:1]', 'Cl[*:1]'], 'R2': ['[H]O[*:2]', '[H]O[*:2]', '']}, [])
```
* Fixes#2471
* Tweak the scoring function to penalize non h matches considerably. Only full H rgroups get a one. Might need to tweak int the future
* Scale the hydrogens as 1/# mols unless they are a full group
* handle the heavy-atom degree queries differently
* Fixes#1563
* add a test for the heavy atom degree option
* Support (and test) adjustHeavyDegree in the cartridge too.
* test results
* Adds RGroupDecomp free function and python wrapper
* Fixes subtle bug, adds new RGroupDecomp API
* Updates results for the subtle bug fix. Verified results were correct.
* Removes smilesCaching.
* Changes RGroupDecompose ordering, adds docstrings and more tests
* Fixes#1550
* might as well update properties on the r groups too
* add option to remove Hs from sidechains;
expose a few more parameters to python;
expose ctor with parameter object to python