* Allow the same core to match more than once in a molecule.
* Update annotation.
* Changes after review.
---------
Co-authored-by: David Cosgrove <david@cozchemix.co.uk>
* restore "oops, exponential is a pain" code snippet in the RGD algorithm
* - added MAX_PERMUTATIONS constant
- added comments about what the restored snippet does
- removed unused permutationProduct variable
* changes in response to review
---------
Co-authored-by: ptosco <paolo.tosco@novartis.com>
* Code/GraphMol/Depictor/RDDepictor.h
- fixed typo in docstring
Code/GraphMol/RGroupDecomposition/RGroupCore.cpp
- added a missing const; formatting changes
Code/GraphMol/RGroupDecomposition/RGroupData.cpp, Code/GraphMol/RGroupDecomposition/RGroupData.h
- moved the code which merges disconnected R groups sharing the same attachment point into a single combined molecule to a private method, RGroupData::mergeIntoCombinedMol(). The method also includes logic to merge atom and bond highlights, if present.
- modernized a for loop
- isMolHydrogen is now a static function since it does not actually require any instance data
- implemented three static function to return the R group, Core and Mol labels, respectively
Code/GraphMol/RGroupDecomposition/RGroupDecomp.cpp, Code/GraphMol/RGroupDecomposition/RGroupDecomp.h
- implemented two private methods, RGroupDecomposition::labelAtomBondIndices() and RGroupDecomposition::setTargetAtomBondIndices(). The first method tags all atoms and bonds in the target molecule such that they can be tracked following core removal by RDKit::replaceCore(). The second method sets common_properties::_rgroupTargetAtoms and common_properties::_rgroupTargetBonds properties on core and R groups. These are vectors of atom and bond indices in the target molecule corresponding to core and R group atom/bonds, respectively, and can be used for color-coding the target molecule according to the R group decomposition it was subjected to, similarly to https://greglandrum.github.io/rdkit-blog/posts/2021-08-07-rgd-and-highlighting.html
Code/GraphMol/RGroupDecomposition/RGroupDecompData.cpp
- formatting changes and for loop modernization
Code/GraphMol/RGroupDecomposition/RGroupDecompParams.cpp, Code/GraphMol/RGroupDecomposition/RGroupDecompParams.h
- implemented updateRGroupDecompositionParametersFromJSON()
- added includeTargetMolInResults boolean parameter
Code/GraphMol/RGroupDecomposition/RGroupMatch.h
- implemented RGroupMatch::setTargetMoleculeForHighlights() and RGroupMatch::getTargetMoleculeForHighlights() methods to, respectively set and get the target molecule where R group decomposition can be color-coded with highlights. This molecule includes the explicit H atoms corresponding to extracted R groups, if any.
Code/GraphMol/RGroupDecomposition/Wrap/rdRGroupComposition.cpp
- use a std::unique_ptr to store the pointer to the C++ RGroupDecomposition instance
- fixed typos in docstrings
Code/GraphMol/RGroupDecomposition/Wrap/test_rgroups.py
- added test for the new includeTargetMolInResults parameter
Code/GraphMol/RGroupDecomposition/catch_rgd.cpp
- added test for the new includeTargetMolInResults parameter
Code/GraphMol/RGroupDecomposition/testRGroupDecomp.cpp
- formatting changes
Code/GraphMol/RGroupDecomposition/testRGroupInternals.cpp
- do not use deprecated constant
Code/MinimalLib/CMakeLists.txt
- added RDK_BUILD_MINIMAL_LIB_RGROUPDECOMP CMake flag to optionally expose R group decomposition functionality into MinimalLib
Code/MinimalLib/common.h
- added makeDummiesQueries flag to mol_from_input() (defaults to false)
- implemented parse_highlight_multi_colors() function to parse multi-color atom and bond highlights
- enable multi-color atom and bond highlighting
Code/MinimalLib/demo/rgd_demo.html
- added HTML page showcasing the multi-color highlights similarly to https://greglandrum.github.io/rdkit-blog/posts/2021-08-07-rgd-and-highlighting.html
Code/MinimalLib/jswrapper.cpp
- removed checks for non-nullness of d_mol as d_mol cannot be directly accessed anymore
- replaced all instances of d_mol with get()
- implemented support for multi-color atom and bond highlights
- implemented optional support for R group decomposition
- added JSMol::copy() convenience method with same functionality as get_mol_copy() to duplicate a molecule
Code/MinimalLib/minilib.cpp, Code/MinimalLib/minilib.h
- replaced all occurrences of d_mol with get(), as d_mol is now private
- removed all occurrences of assert(d_mol) as non-nullness is checked at construction time and whenever get() is called
- JSMol is now split into two subbclasses, JSMolUnique and JSMolShared, which both inherit from the JSMol base class. JSMolUnique can be constructed from a RWMol* (as the old JSMol), while JSMolShared can be constructed from a ROMOL_SPTR. This avoids unnecessary copies when wrapping a ROMOL_SPTR (e.g., from subtructure library, JSMolList or R group decomposition) into a JSMol to pass it to JS. This also avoids that modifications done in the JS layer on a molecule stored in a MolList (e.g., adding a property) are not persisted because they are carried out on a volatile copy of the molecule rather than on the actual molecule.
Code/MinimalLib/tests/tests.js
- added a test for pesistence of modifications made to JSSharedMol
- added tests for RGD
- added test for JSMol::copy()
Code/RDGeneral/RDValue.h
- removed trailing comma from vector properties such that they can be deserialized as syntactically correct JSON
Code/RDGeneral/types.cpp, Code/RDGeneral/types.h
- added _rgroupTargetAtoms and _rgroupTargetBonds common_properties
* Code/GraphMol/Depictor/RDDepictor.h
- fixed typo in docstring
Code/GraphMol/RGroupDecomposition/RGroupCore.cpp
- added a missing const; formatting changes
Code/GraphMol/RGroupDecomposition/RGroupData.cpp, Code/GraphMol/RGroupDecomposition/RGroupData.h
- moved the code which merges disconnected R groups sharing the same attachment point into a single combined molecule to a private method, RGroupData::mergeIntoCombinedMol(). The method also includes logic to merge atom and bond highlights, if present.
- modernized a for loop
- isMolHydrogen is now a static function since it does not actually require any instance data
- implemented three static function to return the R group, Core and Mol labels, respectively
Code/GraphMol/RGroupDecomposition/RGroupDecomp.cpp, Code/GraphMol/RGroupDecomposition/RGroupDecomp.h
- implemented two private methods, RGroupDecomposition::labelAtomBondIndices() and RGroupDecomposition::setTargetAtomBondIndices(). The first method tags all atoms and bonds in the target molecule such that they can be tracked following core removal by RDKit::replaceCore(). The second method sets common_properties::_rgroupTargetAtoms and common_properties::_rgroupTargetBonds properties on core and R groups. These are vectors of atom and bond indices in the target molecule corresponding to core and R group atom/bonds, respectively, and can be used for color-coding the target molecule according to the R group decomposition it was subjected to, similarly to https://greglandrum.github.io/rdkit-blog/posts/2021-08-07-rgd-and-highlighting.html
Code/GraphMol/RGroupDecomposition/RGroupDecompData.cpp
- formatting changes and for loop modernization
Code/GraphMol/RGroupDecomposition/RGroupDecompParams.cpp, Code/GraphMol/RGroupDecomposition/RGroupDecompParams.h
- implemented updateRGroupDecompositionParametersFromJSON()
- added includeTargetMolInResults boolean parameter
Code/GraphMol/RGroupDecomposition/RGroupMatch.h
- implemented RGroupMatch::setTargetMoleculeForHighlights() and RGroupMatch::getTargetMoleculeForHighlights() methods to, respectively set and get the target molecule where R group decomposition can be color-coded with highlights. This molecule includes the explicit H atoms corresponding to extracted R groups, if any.
Code/GraphMol/RGroupDecomposition/Wrap/rdRGroupComposition.cpp
- use a std::unique_ptr to store the pointer to the C++ RGroupDecomposition instance
- fixed typos in docstrings
Code/GraphMol/RGroupDecomposition/Wrap/test_rgroups.py
- added test for the new includeTargetMolInResults parameter
Code/GraphMol/RGroupDecomposition/catch_rgd.cpp
- added test for the new includeTargetMolInResults parameter
Code/GraphMol/RGroupDecomposition/testRGroupDecomp.cpp
- formatting changes
Code/GraphMol/RGroupDecomposition/testRGroupInternals.cpp
- do not use deprecated constant
Code/MinimalLib/CMakeLists.txt
- added RDK_BUILD_MINIMAL_LIB_RGROUPDECOMP CMake flag to optionally expose R group decomposition functionality into MinimalLib
Code/MinimalLib/common.h
- added makeDummiesQueries flag to mol_from_input() (defaults to false)
- implemented parse_highlight_multi_colors() function to parse multi-color atom and bond highlights
- enable multi-color atom and bond highlighting
Code/MinimalLib/demo/rgd_demo.html
- added HTML page showcasing the multi-color highlights similarly to https://greglandrum.github.io/rdkit-blog/posts/2021-08-07-rgd-and-highlighting.html
Code/MinimalLib/jswrapper.cpp
- removed checks for non-nullness of d_mol as d_mol cannot be directly accessed anymore
- replaced all instances of d_mol with get()
- implemented support for multi-color atom and bond highlights
- implemented optional support for R group decomposition
- JSMol is now split into two subbclasses, JSMol and JSMolShared, which both inherit from the JSMolBase class. While JSMol can be constructed from a RWMol* as usual, JSMolShared can be constructed from a ROMOL_SPTR. This avoids unnecessary copies when wrapping a ROMOL_SPTR (e.g., from subtructure library, JSMolList or R group decomposition) into a JSMol to pass it to JS. This also avoids that modifications done in the JS layer on a molecule stored in a MolList (e.g., adding a property) are not persisted because they are carried out on a volatile copy of the molecule rather than on the actual molecule.
- added JSMolBase::copy() convenience method with same functionality as get_mol_copy() to duplicate a molecule
Code/MinimalLib/minilib.cpp, Code/MinimalLib/minilib.h
- replaced all occurrences of d_mol with get(), as d_mol is now private
- removed all occurrences of assert(d_mol) as non-nullness is checked at construction time and whenever get() is called
Code/MinimalLib/tests/tests.js
- added a test for pesistence of modifications made to JSMolShared
- added tests for RGD
- added test for JSMolBase::copy()
Code/RDGeneral/RDValue.h
- removed trailing comma from vector properties such that they can be deserialized as syntactically correct JSON
Code/RDGeneral/types.cpp, Code/RDGeneral/types.h
- added _rgroupTargetAtoms and _rgroupTargetBonds common_properties
* added assignChiralTypesFromMolParity flag
* added test for makeDummiesQueries
* added CFFI tests
* reordered tests
* re-added piece of code that had gone accidentally lost while merging conflicts
* Removed CHECK_INVARIANT following code review
---------
Co-authored-by: ptosco <paolo.tosco@novartis.com>
* RGD code cleanup
- made an effort to give more meaningful names to variables (e.g., renamed most instances of attachment (point) to avoid ambiguity as attachment may be interpreted as either the R-group atom or its neighbor atom on the core, which are two different things)
- replaced the old school removeAtom() method with begin/commitBatchEdit()
- added std::move and std::make_move_iterator where relevant to avoid potential unintended copying
- replaced instances of container.size() == 0 and !container.size() with container.empty() for better clarity
- replaced std::map::find() with std::map::at() where the key was not needed
- replaced expensive std::find_if with more efficient alternative
- added some missing const keywords and added references to avoid copying where appropriate
- replaced for loops with modern implicit looping alternatives where convenient
- avoid calling MolToSmiles when VERBOSE is not defined as the result is anyway not used
- removed "oops, exponential is a pain" code snippet as I believe 1. it is never executed 2. it is not tested 3. I do not think it is correct
- removed check for data->matches.size() > 1 as I do not believe it is correct
- Use std::unique_ptr::reset instead of defining a new std::unique_ptr and moving it to the original one
* changes in response to review
* change in response to review
* replaced std::set with boost::dynamic_bitset to save time on std::set::insert and std::set::find
* make sure we do not go out of bounds
---------
Co-authored-by: ptosco <paolo.tosco@novartis.com>
* - extracted the core matching logic into a separate function
- added relevant C++ and Python tests
* Update Code/GraphMol/RGroupDecomposition/RGroupDecomp.cpp
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
---------
Co-authored-by: ptosco <paolo.tosco@novartis.com>
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* Support tautomer queries in RGD
* Continuing RGD and tautomer development
* Python and C# tests
* Python and C# tests
* C# test
* Typo fix
* For cire tautomer query update properties instead of full sanitization
* Added query comment
* Code review change
* Support Enumeration of input cores
* Mol enumeration test
* Remove useNormalMatch from RGroupDecomp
* Added comments for handling tautomeric core
* Added comments for handling tautomeric core
* Undo change to master
* Fixed typo in tests
* Undo change to master
* Initial development and test
* Sort of working tests
* Copy corodinates to new core
* Clear stereochemistry on core atoms with unlabelled rgroups
* Fixed typo in tests
* Undo change to master
* Fixed typo in tests
* Undo change to master
* Fixed typo in tests
* Undo change to master
* Continuing development
* Updated development
* Fixed Chirality Issues
* All tests working
* Remove some unused code
* Fixed typo in tests
* Undo change to master
* Fixed typo in tests
* Undo change to master
* Fixed typo in tests
* Undo change to master
* Working tests
* Tidy test code
* Adjust catch_rgd for stereochemistry in output cores
* Build ring info in output cores
* Fix Mac OS bug
* Fix for MCS and onlyMatchAtRGroups
* Brian's optimization suggestion
* Fix core group coordinate bug
* Test for replaceCore and multiple core bonds to chiral atom
* Fixed typo in tests
* Undo change to master
* Fixed typo in tests
* Undo change to master
* Fixed typo in tests
* Undo change to master
* Fixed typo in tests
* Undo change to master
* Fixed typo in tests
* Undo change to master
* Fixed typo in tests
* Undo change to master
* Update Code/GraphMol/RGroupDecomposition/RGroupDecompParams.cpp
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* Update Code/GraphMol/RGroupDecomposition/RGroupDecompParams.cpp
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* Changes in response to Greg's code review
* R group stereo bond attachment fix
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* Fixed typo in tests
* Undo change to master
* Undo change to master
* Recalculate core dummy positions for hydrogen only r groups
* Undo change to master
* Fixed typo in tests
* Undo change to master
* Greg's code review changes
* Fix for RGD dummy atom bug
* Also fix labelling issues in the R group containing input dummy atom
* minor tweaks to the proposed fix
Co-authored-by: greg landrum <greg.landrum@gmail.com>
* Later cores with more R-groups should only be chosen when they are structurally related, i.e. when they are superstructures of earlier cores
* Update Code/GraphMol/RGroupDecomposition/testRGroupDecomp.cpp
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* Update Code/GraphMol/RGroupDecomposition/testRGroupDecomp.cpp
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
Co-authored-by: Tosco, Paolo <paolo.tosco@novartis.com>
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* Support wildcard in input structures
* Fix typos
* Handle R groups containing a single wildcard and wildcards with group numbers
* Reorder tests
* Use propety instead of isotope to mark input dummy
* fix the windows DLL builds
* Added constexpr for dummy input atom property
* Omitted to replace one instance of string 'INPUT_DUMMY'
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* speed up scoring of permutations by clever caching of already settled
matches rather than recomputing scores for all matches every time
process() is called
* changes in response to review
* changes in response to review
Co-authored-by: Paolo Tosco <paolo.tosco@novartis.com>
* support substructure search parameters in RGD.
Still needs testing/verification of the enhanced stereo stuff
* test enhanced stereo
* add support to python wrapper
unfortunately some python reformatting got mixed in there.
* Most tests working
* All tests working
* Fixed tests after merge with master
* Create header and implementations for RCore
* Updated comments
* Removed old code
* DLL export for MolMatchFinalCheckFunctor
* Information line for failing Mac test
* Log replace core behaviour
* Ordering fix for OSX
* Possible fuzzer fix
* Removed debug output
* Fix unmatched user R group bug
* Code review changes
* Bug fix and ChemTransforms test
* - do not add unncessary R-labels
- use a boost::dynamic_bitset rather than a std::set for lookups
* - R group labels can be >0 or <0, not 0, so no need to check for >=0 when looking for user labels
- as soon as a core is found that requires no additional labels to accommodate a molecule, bail out from the loop as no better core can be found
- add a test to better describe the use case for this change
- remove a signed/unsigned warning
* - added an entry to Release Notes to describe the impact of #3969
* avoid French expressions in Release Notes
Co-authored-by: Paolo Tosco <paolo.tosco@novartis.com>
* 1) fixed residual kekulization issues when R-labels are removed from aromatic atoms on the core
2) added removeAllHydrogenRGroupsAndLabels flag, which defaults to True. If set to False,
unused user-defined labels on the core will be retained. If also removeAllHydrogenRGroups
is set to False, all-hydrogen user-defined R-groups will be included in the output.
Under no circumstance dynamically-added R-groups (when onlyMatchAtRGroups=false)
that are all-hydrogen will be included in the output, as that's not useful (change
compared to previous behavior)
3) added tests for (1), (2) and for removeAllHydrogenRGroups, that had no tests so far
4) fixed a few documentation typos in the Python code and added some documentation
to the C++ sources
* added missing test files
Co-authored-by: Paolo Tosco <paolo.tosco@novartis.com>
* make sure that unlabelled R-groups on aromatic nitrogens have correct valence and formal charge to avoid kekulization failures
* change in response to review
Co-authored-by: Paolo Tosco <paolo.tosco@novartis.com>
* backup
* simple first pass, passes all tests
* cleanup a bunch of existing uses
* ensure that we can safely add atoms/bonds while in edit mode
* add context manager on python side
* handle exceptions properly in those
* changes in response to review
* RGD modifications for any atom and index labels
* Continued development
* All tests working
* Added comment
* CR changes suggested by PTosco
* Fix catch_rgd for autocrlf
* Core dummy matches on output. RGroups on heavy atom. Dummy atoms User rgroups only when they are degree 1.
* Start work on test fixes
* testRGroupDecomp test working
* CPP and Python tests working
* Removed options for matching core query atoms on sidechains
* Windows build fix
* R groups off ring. User group matches single heavy substituent. Remove extraneous hydrogens
* Updated fingerprint variance score and tie selection
* Refactor fingerprint variance score functions to class
* Removed fingerprint distance score
* Boost::trim fix
* Updated RGD test notebook
* Fixed AddHs.cpp
* - fixes the kekulization issue
- avoids that empty R-group labels are included in cores
- makes sure that SMILES cores are always canonical
- adds a few missing const declarations and avoids unintentional copying
* Support for allowNonTerminalRGroups parameter. Remove R groups that contain H or Nothing. Ignore R group labels on non-dummy atoms
* Fixed tests for Paolo's changes. Rebuilt test notebook. Increased weighting of rgroup penalty in fingerprint variance score
* remove some debug output
Co-authored-by: Brian Kelley <fustigator@gmail.com>
Co-authored-by: greg landrum <greg.landrum@gmail.com>
* Exploration
* Initial work on GA fro Rgroup Symmetry
* GA for rgroup decomp and fingerprint rgroup symmetry scoring
* Continuing development
* Exploration
* Initial work on GA fro Rgroup Symmetry
* GA for rgroup decomp and fingerprint rgroup symmetry scoring
* Continuing development
* Further development
* Continued tweaks
* Function rename
* Continued tweaks
* Bug fix for variance calculation
* Copyright notices. Remove Eigen dependency. RdKit logging. Clock fix.
* Changes to fix build failures
* Fixes for Windows dynamic DLL build
* Included GA export.h file
* Fixed RGroupDecomp CMakeLists.txt
* Notebooks working, GGroup labelling bug fixed
* Fix windows build. More options for example GA program
* More bugs found and tests adjusted
* Fixed Python rgroup test
* Trivial change to trigger CI
* OSX java and windows build fixes
* Windows DLL fix
* Fix segmentation error
* proposed change
* Possible fix for segmentation fault
* CR fixes
* CR fixes
* CR fixes
* Recreates molecules from rgroups where possible
Co-authored-by: greg landrum <greg.landrum@gmail.com>
Co-authored-by: Brian Kelley <fustigator@gmail.com>
* - replaced set with vector for SMILES-based R-group equivalence
- the first GreedyChunk is constituted by chunkSize+1 mols
- labeled R-groups may not be extracted when onlyMatchAtRGroups==false
- labeled geminal R-groups are incorrectly scored
- my attempt to introduce consistency in R-group labeling was buggy
- added a DEBUG pre-processor directive to the tests to make debugging easier
- added a unit test
- fixed unit test results which were inconsistent with the expected behavior
* changes in response to review
* - Fixes three bugs in the R-group decomposition code
* - delete iterator properly during loop so the Mac does not complain
* added more tests
Co-authored-by: user173873 <user173873@FF026.local>
* refactor RGroupDecompositionParameters to directly initialize data members
add params getter for the RGroupDecomposition object
* initial implementation of timeout
* refactor the timeout check
We could move the function that checks and throws the exception somewhere else
* re-enable the tests I stupidly left disabled
* Fixes#3224
add option to skip symmetrization entirely
* test #3224
* stupid mistake
* run clang-tidy with modernize-use-default-member-init
* results from modernize-use-emplace
* one uniform initialization per line
otherwise SWIG is unhappy
Co-authored-by: Brian Kelley <fustigator@gmail.com>
* run clang-tidy with readability-braces-around-statements
clang-format the results
clean up all the parts that clang-tidy-8 broke
* fix problem on windows
* Better resolve ties in rgroup matches
* Break ties by choosing the permutation that adds the fewest rgroups
* Update aligned cores after tie breaking\nI think this results is more optimal
* Remove unused code
* Replace count with UsedLabels
* Remove whitespace
* Remove whitespace
* Remove redundant code and error message
* Revert test
* remove debug prints
* Fix core labelling and multi-core hydrogen removals
* Add comment that explains confusing bit about indexlabels for multicores
* Multi core fixes: hydrogens properly removed, fixed labelling
* clang-format
* update version of japanese docs
* Remove external labels from cores
* Fix syntax errors
* Add better autodetection of labels, add dummyatom label, don't fall back to indexes when onlyMatchAtRgroups are set
* Add better autodetection of labels, add dummyatom label, don't fall back to indexes when onlyMatchAtRgroups are set
* Move autodetection before alignment, fix final core labelling
* Fix stupid bit twiddling mistake
* None of the original mol's should actually match the cores with onlyMatchAtRgroups
* Convert PRECONDITION to CHECK_INVARIANT
* Run clang-format
* use nullptr instead of 0 for pointers
* Handle cases where molecules don't have anything for an R-group properly.
Here's the python demo of the bug:
```
In [14]: scaffold2 = Chem.MolFromSmiles('c1c([*:1])cncn1')
In [15]: scaffold = Chem.MolFromSmiles('c1c([*:1])cccn1')
In [19]: mols2 = [Chem.MolFromSmiles(smi) for smi in 'c1c(F)cc(O)cn1 c1c(F)cncn1 c1c(Cl)cc(O)cn1'.split()]
In [20]: print(rdRGroupDecomposition.RGroupDecompose([scaffold,scaffold2],mols2,asSmiles=True,asRows=False))
({'Core': ['c1ncc([*:2])cc1[*:1]', 'c1ncc([*:1])cn1', 'c1ncc([*:2])cc1[*:1]'], 'R1': ['F[*:1]', 'F[*:1]', 'Cl[*:1]'], 'R2': ['[H]O[*:2]', '[H]O[*:2]', '']}, [])
```
* Fixes#2471
* Tweak the scoring function to penalize non h matches considerably. Only full H rgroups get a one. Might need to tweak int the future
* Scale the hydrogens as 1/# mols unless they are a full group