Commit Graph

42 Commits

Author SHA1 Message Date
Gareth Jones
5a06ba567a Fix for RGD dummy atom bug in RDKit::replaceCore (#5154)
* Fix for RGD dummy atom bug

* Also fix labelling issues in the R group containing input dummy atom

* minor tweaks to the proposed fix

Co-authored-by: greg landrum <greg.landrum@gmail.com>
2022-05-02 14:01:57 +02:00
Greg Landrum
2489b6cdfb Make the RGD code work when rgroupLabelling is Isotope (#5088)
Also fixes an off-by-one problem with the assigned isotope labels
2022-03-14 04:40:22 +01:00
Paolo Tosco
1872ea5a47 Fixes a bug with the choice of RGD cores (#4890)
* Later cores with more R-groups should only be chosen when they are structurally related, i.e. when they are superstructures of earlier cores

* Update Code/GraphMol/RGroupDecomposition/testRGroupDecomp.cpp

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

* Update Code/GraphMol/RGroupDecomposition/testRGroupDecomp.cpp

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

Co-authored-by: Tosco, Paolo <paolo.tosco@novartis.com>
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
2022-01-26 08:55:23 +01:00
Gareth Jones
a9bc39e0d5 RGD: dummy atom in input structure is mishandled (#4863)
* Support wildcard in input structures

* Fix typos

* Handle R groups containing a single wildcard and wildcards with group numbers

* Reorder tests

* Use propety instead of isotope to mark input dummy

* fix the windows DLL builds

* Added constexpr for dummy input atom property

* Omitted to replace one instance of string 'INPUT_DUMMY'

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
2022-01-10 13:55:54 +01:00
Paolo Tosco
823bc93d04 Major speed-up of RGD scoring (#4544)
* speed up scoring of permutations by clever caching of already settled
matches rather than recomputing scores for all matches every time
process() is called

* changes in response to review

* changes in response to review

Co-authored-by: Paolo Tosco <paolo.tosco@novartis.com>
2021-09-30 09:31:49 -04:00
Greg Landrum
c55182a8e0 Support using SubstructMatchParameters in RGD (#4318)
* support substructure search parameters in RGD.
Still needs testing/verification of the enhanced stereo stuff

* test enhanced stereo

* add support to python wrapper

unfortunately some python reformatting got mixed in there.
2021-07-09 15:07:43 +02:00
Gareth Jones
9622580660 Comments added to RGD core matching (#4189)
* Comments added to RGD core matching

* Reformatted code
2021-06-01 14:49:18 +02:00
Gareth Jones
c2fb57c19f RGD - a fix for the cubane issue (single target atom matches 2 user R group attachments) (#4002)
* Most tests working

* All tests working

* Fixed tests after merge with master

* Create header and implementations for RCore

* Updated comments

* Removed old code

* DLL export for MolMatchFinalCheckFunctor

* Information line for failing Mac test

* Log replace core behaviour

* Ordering fix for OSX

* Possible fuzzer fix

* Removed debug output

* Fix unmatched user R group bug

* Code review changes

* Bug fix and ChemTransforms test
2021-05-23 15:16:03 -04:00
Brian Kelley
2bab9e6125 Add return codes and make RGroupDecomp less verbose (#3971)
* Add return codes and make RGroupDecomp less verbose

* Changes in response to review

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
2021-03-26 08:29:54 +01:00
Paolo Tosco
29599cf6b1 Do not add unnecessary R-labels (and an optimization) (#3969)
* - do not add unncessary R-labels
- use a boost::dynamic_bitset rather than a std::set for lookups

* - R group labels can be >0 or <0, not 0, so no need to check for >=0 when looking for user labels
- as soon as a core is found that requires no additional labels to accommodate a molecule, bail out from the loop as no better core can be found
- add a test to better describe the use case for this change
- remove a signed/unsigned warning

* - added an entry to Release Notes to describe the impact of #3969

* avoid French expressions in Release Notes

Co-authored-by: Paolo Tosco <paolo.tosco@novartis.com>
2021-03-25 04:37:06 +01:00
Paolo Tosco
6b50f6528c Adds removeAllHydrogenRGroupsAndLabels and fixes kekulization issues (#3944)
* 1) fixed residual kekulization issues when R-labels are removed from aromatic atoms on the core
2) added removeAllHydrogenRGroupsAndLabels flag, which defaults to True. If set to False,
   unused user-defined labels on the core will be retained. If also removeAllHydrogenRGroups
   is set to False, all-hydrogen user-defined R-groups will be included in the output.
   Under no circumstance dynamically-added R-groups (when onlyMatchAtRGroups=false)
   that are all-hydrogen will be included in the output, as that's not useful (change
   compared to previous behavior)
3) added tests for (1), (2) and for removeAllHydrogenRGroups, that had no tests so far
4) fixed a few documentation typos in the Python code and added some documentation
   to the C++ sources

* added missing test files

Co-authored-by: Paolo Tosco <paolo.tosco@novartis.com>
2021-03-19 04:57:50 +01:00
Paolo Tosco
d6ac27e2ea Fixes issues with unlabelled groups on aromatic nitrogens (#3908)
* make sure that unlabelled R-groups on aromatic nitrogens have correct valence and formal charge to avoid kekulization failures

* change in response to review

Co-authored-by: Paolo Tosco <paolo.tosco@novartis.com>
2021-03-12 14:39:52 +01:00
Greg Landrum
2e3f31990d Allow batch editing of molecules: removal only (#3875)
* backup

* simple first pass, passes all tests

* cleanup a bunch of existing uses

* ensure that we can safely add atoms/bonds while in edit mode

* add context manager on python side

* handle exceptions properly in those

* changes in response to review
2021-03-11 05:10:43 +01:00
Gareth Jones
81d3705358 R group match any issue (#3767)
* RGD modifications for any atom and index labels

* Continued development

* All tests working

* Added comment

* CR changes suggested by PTosco

* Fix catch_rgd for autocrlf

* Core dummy matches on output. RGroups on heavy atom. Dummy atoms User rgroups only when they are degree 1.

* Start work on test fixes

* testRGroupDecomp test working

* CPP and Python tests working

* Removed options for matching core query atoms on sidechains

* Windows build fix

* R groups off ring. User group matches single heavy substituent. Remove extraneous hydrogens

* Updated fingerprint variance score and tie selection

* Refactor fingerprint variance score functions to class

* Removed fingerprint distance score

* Boost::trim fix

* Updated RGD test notebook

* Fixed AddHs.cpp

* - fixes the kekulization issue
- avoids that empty R-group labels are included in cores
- makes sure that SMILES cores are always canonical
- adds a few missing const declarations and avoids unintentional copying

* Support for allowNonTerminalRGroups parameter. Remove R groups that contain H or Nothing.  Ignore R group labels on non-dummy atoms

* Fixed tests for Paolo's changes. Rebuilt test notebook.  Increased weighting of rgroup penalty in fingerprint variance score

* remove some debug output

Co-authored-by: Brian Kelley <fustigator@gmail.com>
Co-authored-by: greg landrum <greg.landrum@gmail.com>
2021-03-10 12:56:42 +01:00
Gareth Jones
ec5a172886 R group symmetry (#3565)
* Exploration

* Initial work on GA fro Rgroup Symmetry

* GA for rgroup decomp and fingerprint rgroup symmetry scoring

* Continuing development

* Exploration

* Initial work on GA fro Rgroup Symmetry

* GA for rgroup decomp and fingerprint rgroup symmetry scoring

* Continuing development

* Further development

* Continued tweaks

* Function rename

* Continued tweaks

* Bug fix for variance calculation

* Copyright notices. Remove Eigen dependency. RdKit logging.  Clock fix.

* Changes to fix build failures

* Fixes for Windows dynamic DLL build

* Included GA export.h file

* Fixed RGroupDecomp CMakeLists.txt

* Notebooks working, GGroup labelling bug fixed

* Fix windows build.  More options for example GA program

* More bugs found and tests adjusted

* Fixed Python rgroup test

* Trivial change to trigger CI

* OSX java and windows build fixes

* Windows DLL fix

* Fix segmentation error

* proposed change

* Possible fix for segmentation fault

* CR fixes

* CR fixes

* CR fixes

* Recreates molecules from rgroups where possible

Co-authored-by: greg landrum <greg.landrum@gmail.com>
Co-authored-by: Brian Kelley <fustigator@gmail.com>
2021-01-05 09:27:33 -05:00
Paolo Tosco
5e31c975a2 Fixes a few residual issues with the RGD code (#3606)
* - replaced set with vector for SMILES-based R-group equivalence
- the first GreedyChunk is constituted by chunkSize+1 mols
- labeled R-groups may not be extracted when onlyMatchAtRGroups==false
- labeled geminal R-groups are incorrectly scored
- my attempt to introduce consistency in R-group labeling was buggy
- added a DEBUG pre-processor directive to the tests to make debugging easier
- added a unit test
- fixed unit test results which were inconsistent with the expected behavior

* changes in response to review
2020-12-12 08:52:00 -05:00
Paolo Tosco
4b978f0c58 Fixes an RGD issue with cores having dummies adjacent to R-group labels (#3551)
* - fixes an issue with cores having dummy atoms adjacent to R-group labels

* changes in response to review
2020-11-09 08:07:43 -05:00
Paolo Tosco
b74a4aeb4c Fixes a few bugs in the R-group decomposition code (#3497)
* - Fixes three bugs in the R-group decomposition code

* - delete iterator properly during loop so the Mac does not complain

* added more tests

Co-authored-by: user173873 <user173873@FF026.local>
2020-10-19 07:46:06 -04:00
Paolo Tosco
1775eee644 Code modernization and an optimization (#3437)
* - code modernization
- removed unnecessary O(n) generation of const set

* - minor constructor tweak
2020-09-28 17:09:18 -04:00
Brian Kelley
635cfbfcaf Enable recursive smarts in the rgroup decomposition (#3404) 2020-09-11 16:05:04 +02:00
Brian Kelley
015fed1e67 rgroup speedup (#3279)
* First part of restructuring rgroup decomposition classes

* Add docs

* Cache often calculated values and simplfy lookups

* Fix accidentally deleted code

* Merge doc changes

* Remove unused timing code

* Remove unused header

* Remove redundant doc string

* Fix chrono issues

* Response to review

Co-authored-by: Brian Kelley <bkelley@relaytx.com>
2020-07-12 06:56:15 +02:00
Brian Kelley
adf19f7517 RGroupDecomposition restructuring (#3270)
* First part of restructuring rgroup decomposition classes

* Add docs

* Move doc strings before structs
2020-07-08 11:47:00 -04:00
Greg Landrum
90585c0f0d Add optional timeout to RGroupDecomposition (#3223)
* refactor RGroupDecompositionParameters to directly initialize data members
add params getter for the RGroupDecomposition object

* initial implementation of timeout

* refactor the timeout check
We could move the function that checks and throws the exception somewhere else

* re-enable the tests I stupidly left disabled

* Fixes #3224
add option to skip symmetrization entirely

* test #3224

* stupid mistake
2020-06-15 09:14:20 -04:00
Greg Landrum
45b9aef28b clang-tidy modernize-use-default-member-init and modernize-use-emplace (#3190)
* run clang-tidy with modernize-use-default-member-init

* results from modernize-use-emplace

* one uniform initialization per line
otherwise SWIG is unhappy

Co-authored-by: Brian Kelley <fustigator@gmail.com>
2020-05-28 09:07:58 +02:00
Greg Landrum
d41752d558 run clang-tidy with readability-braces-around-statements (#2899)
* run clang-tidy with readability-braces-around-statements
clang-format the results
clean up all the parts that clang-tidy-8 broke

* fix problem on windows
2020-01-25 14:19:32 +01:00
Eisuke Kawashima
5cd27a242f Fix typo (#2862)
* Fix typo

* Reflect the comments

* Fix more typos
2019-12-31 06:43:27 +01:00
Eisuke Kawashima
dc7cc84a0c Fix typo [ci skip] 2019-10-17 17:45:50 +09:00
Brian Kelley
47264fe727 Allow Java to see RGroup labels in the std::map wrapper. (#2681) 2019-10-03 16:52:01 +02:00
Brian Kelley
e5a57fae02 Dev/rgroup handle ties in symmetry matching (#2628)
* Better resolve ties in rgroup matches

* Break ties by choosing the permutation that adds the fewest rgroups

* Update aligned cores after tie breaking\nI think this results is more optimal

* Remove unused code

* Replace count with UsedLabels

* Remove whitespace

* Remove whitespace

* Remove redundant code and error message

* Revert test

* remove debug prints
2019-09-11 16:15:14 -04:00
Brian Kelley
dfc79c98fa Fix/rgroup multiple labels (#2481)
* Fix core labelling and multi-core hydrogen removals

* Add comment that explains confusing bit about indexlabels for multicores

* Multi core fixes:  hydrogens properly removed, fixed labelling

* clang-format
2019-06-07 04:43:20 +02:00
Brian Kelley
b6e5bdd111 Fix/rgroup sdf isotope (#2449)
* update version of japanese docs

* Remove external labels from cores

* Fix syntax errors

* Add better autodetection of labels, add dummyatom label, don't fall back to indexes when onlyMatchAtRgroups are set

* Add better autodetection of labels, add dummyatom label, don't fall back to indexes when onlyMatchAtRgroups are set

* Move autodetection before alignment, fix final core labelling

* Fix stupid bit twiddling mistake

* None of the original mol's should actually match the cores with onlyMatchAtRgroups

* Convert PRECONDITION to CHECK_INVARIANT

* Run clang-format

* use nullptr instead of 0 for pointers

* Handle cases where molecules don't have anything for an R-group properly.

Here's the python demo of the bug:

```
In [14]: scaffold2 = Chem.MolFromSmiles('c1c([*:1])cncn1')

In [15]: scaffold = Chem.MolFromSmiles('c1c([*:1])cccn1')

In [19]: mols2 = [Chem.MolFromSmiles(smi) for smi in 'c1c(F)cc(O)cn1 c1c(F)cncn1 c1c(Cl)cc(O)cn1'.split()]

In [20]: print(rdRGroupDecomposition.RGroupDecompose([scaffold,scaffold2],mols2,asSmiles=True,asRows=False))
({'Core': ['c1ncc([*:2])cc1[*:1]', 'c1ncc([*:1])cn1', 'c1ncc([*:2])cc1[*:1]'], 'R1': ['F[*:1]', 'F[*:1]', 'Cl[*:1]'], 'R2': ['[H]O[*:2]', '[H]O[*:2]', '']}, [])
```

* Fixes #2471
2019-06-04 15:41:20 +02:00
Greg Landrum
255b254690 Fixes #2332 (#2378) 2019-03-30 08:43:07 -04:00
Greg Landrum
f23bde46d3 fixes an r-group symmetrization problem (#2324)
* fixes a r-group symmetrization problem

* clang-tidy

* changes in response to review

* typo
2019-03-08 09:11:15 -05:00
Brian Kelley
73e6b751ce RGroupDecomposition fixes, keep userLabels more robust onlyMatchAtRGroups (#2202)
* Fix onlyMatchAtRGroups
 adjust queries wasn’t working

* Keep user RLabels if present in the core

* Fix tests

* Fix for review comments
2019-01-02 17:57:34 +00:00
Brian Kelley
a661489226 Fix/rgroup prefer matching nonhs over hs (#1707)
* Tweak the scoring function to penalize non h matches considerably.  Only full H rgroups get a one.  Might need to tweak int the future

* Scale the hydrogens as 1/# mols unless they are a full group
2018-01-09 05:56:37 +01:00
Greg Landrum
1efa8e696e another clang-format run 2017-10-12 06:42:15 +02:00
Greg Landrum
f94e277856 another pass of clang modernize 2017-10-12 06:35:51 +02:00
Greg Landrum
c0d3842df1 get the new code building with the old compiler (#1601) 2017-10-02 09:37:18 +02:00
Greg Landrum
db89172bf8 handle the heavy-atom degree queries differently (#1560)
* handle the heavy-atom degree queries differently

* Fixes #1563

* add a test for the heavy atom degree option

* Support (and test) adjustHeavyDegree in the cartridge too.

* test results
2017-09-12 16:10:15 -04:00
Brian Kelley
58ede0f81b Dev/rgroup decomp freefunction (#1557)
* Adds RGroupDecomp free function and python wrapper

* Fixes subtle bug, adds new RGroupDecomp API

* Updates results for the subtle bug fix.  Verified results were correct.

* Removes smilesCaching.

* Changes RGroupDecompose ordering, adds docstrings and more tests
2017-09-12 17:41:21 +02:00
Greg Landrum
5a06022704 R group improvements (#1552)
* Fixes #1550

* might as well update properties on the r groups too

* add option to remove Hs from sidechains;
expose a few more parameters to python;
expose ctor with parameter object to python
2017-09-08 07:49:05 -04:00
Greg Landrum
62150f7d80 Squashed commit of the following:
commit 7f7b5268a62eecd260027e0918abbdf62b100034
Merge: 90e9fd3 6dd173d
Author: Greg Landrum <greg.landrum@gmail.com>
Date:   Tue Aug 8 01:19:15 2017 +0200

    merge back to master

commit 6dd173dec6
Merge: 45a94bd e11ad49
Author: Brian Kelley <fustigator@gmail.com>
Date:   Thu Aug 3 07:54:26 2017 -0400

    Merge pull request #8 from greglandrum/dev/rgroup-decomposition

    support using generic iterators in ctor;

commit e11ad49068
Author: Greg Landrum <greg.landrum@gmail.com>
Date:   Tue Aug 1 06:29:52 2017 +0200

    move notebooks to docs

commit 606c03c28f
Author: Greg Landrum <greg.landrum@gmail.com>
Date:   Thu Jul 27 05:13:13 2017 +0200

    support using generic iterators in ctor;
    general comment: this is a useful pattern that we could use elsewhere

commit 45a94bd663
Author: Brian Kelley <brian.kelley@novartis.com>
Date:   Wed Jul 26 13:30:23 2017 -0400

    Updates notebooks

commit 8f78ba97d3
Author: Brian Kelley <brian.kelley@novartis.com>
Date:   Wed Jul 26 09:00:42 2017 -0400

    Updates notebooks

commit 44728803ae
Merge: d67409d 4d0b00d
Author: Brian Kelley <fustigator@gmail.com>
Date:   Wed Jul 26 08:53:11 2017 -0400

    Merge pull request #7 from greglandrum/dev/rgroup-decomposition

    clean up a couple leaks and some compiler warnings

commit 4d0b00dd2e
Author: Greg Landrum <greg.landrum@gmail.com>
Date:   Wed Jul 26 07:47:56 2017 +0200

    clean up a couple leaks and some compiler warnings

commit d67409da0c
Author: Brian Kelley <brian.kelley@novartis.com>
Date:   Tue Jul 25 11:43:14 2017 -0400

    Makes the scoring system more sane

commit 1b5181dc2f
Author: Brian Kelley <brian.kelley@novartis.com>
Date:   Tue Jul 25 10:55:33 2017 -0400

    Finalizes enums

commit 7e9ee61556
Author: Brian Kelley <brian.kelley@novartis.com>
Date:   Tue Jul 25 10:55:18 2017 -0400

    Fixes constructor botched in the last commit

commit aed2a201bf
Author: Brian Kelley <brian.kelley@novartis.com>
Date:   Mon Jul 24 18:22:42 2017 -0400

    Cleans up some code

commit 95e82a1398
Author: Brian Kelley <brian.kelley@novartis.com>
Date:   Mon Jul 24 18:14:31 2017 -0400

    Removes unused variable

commit 0b1ed09316
Author: Brian Kelley <brian.kelley@novartis.com>
Date:   Mon Jul 24 18:14:23 2017 -0400

    Slight optimization by combining two loops

commit ed3340a516
Author: Brian Kelley <brian.kelley@novartis.com>
Date:   Mon Jul 24 13:53:06 2017 -0400

    Fixes post increments in for loops

commit 25b1678a58
Author: Brian Kelley <brian.kelley@novartis.com>
Date:   Mon Jul 24 13:52:53 2017 -0400

    Fixes memory leak and doesn’t call SmartsToMol twice

commit 86c8c42688
Author: Brian Kelley <brian.kelley@novartis.com>
Date:   Mon Jul 24 13:52:32 2017 -0400

    Adds header guards

commit b043e38d3a
Author: Brian Kelley <brian.kelley@novartis.com>
Date:   Mon Jul 24 13:52:25 2017 -0400

    Removes unused variable

commit 631aa77153
Author: Brian Kelley <brian.kelley@novartis.com>
Date:   Tue Jul 11 08:10:39 2017 -0400

    Fixes typo in filename

commit d6e0dc753a
Author: Brian Kelley <brian.kelley@novartis.com>
Date:   Mon Jul 10 14:48:04 2017 -0400

    Fixes c++11 style enums

commit b9a31eae9a
Author: Brian Kelley <brian.kelley@novartis.com>
Date:   Mon Jul 10 14:00:04 2017 -0400

    Adds RGroupDecomposition attempt
2017-08-08 01:23:49 +02:00