28 Commits

Author SHA1 Message Date
Yakov Pechersky
c6cabf4153 Speed-up tautomer canonicalization, no API changes (#9134)
* Speed up tautomer canonicalization by deferring on SSSR calc

* Lazy kekulization for tautomer enumeration

Defer kekulization of tautomers until they are actually needed for
transform matching. This avoids creating kekulized copies for:
1. The initial tautomer (until first iteration)
2. New tautomers that may never be processed (if enumeration ends early)

The Tautomer class now supports lazy initialization of the kekulized
form via getKekulized() method.

Performance improvement: ~7% additional speedup (total ~22-24% from baseline)

* Use count-only substructure matching in tautomer scoring

* Add SubstructMatchCount regression test

* MolStandardize: reduce enumerate overhead

* MolStandardize: avoid per-tautomer ring recomputation

* Atom: cache PeriodicTable pointer in valence calcs

* Atom: reuse PeriodicTable in getEffectiveAtomicNum

* PeriodicTable: add atomic fast path for getTable

* GraphMol: reduce ROMol copy reallocations

* MolStandardize: use quickCopy for per-match product copies

Use RWMol(*kmol, true) in tautomer enumeration to avoid copying properties/bookmarks/conformers for each candidate. This reduces deep-copy overhead without changing chemistry.

* MolStandardize: pre-filter scoring patterns by element/connectivity

For tautomer scoring, pre-compute which SubstructTerms are relevant for
a given input molecule. Since tautomerization only moves H atoms and
changes bond orders (never creates/destroys heavy-atom bonds), patterns
requiring missing elements or connectivity can be skipped for all
tautomers of that molecule.

Two-stage filtering:
1. Element check: skip patterns requiring atoms not in the molecule
2. Connectivity check: skip patterns whose bond-order-agnostic structure
   doesn't match the input molecule's connectivity

This reduces the number of VF2 substructure calls per tautomer from 12
to typically 3-5, depending on the molecule's composition.

* MolStandardize: preserve molecule properties for canonical tautomer

Copy molecule properties from the original input to the canonical tautomer
result. Since quickCopy during enumeration skips d_props to avoid overhead,
extended SMILES data like link nodes (LN) was lost. This restores them
on the final result.

* TautomerQuery: preserve molecule properties (e.g. link nodes) in tautomers

TautomerQuery::fromMol() uses TautomerEnumerator::enumerate() which uses
quickCopy for performance. This doesn't copy molecule properties like
_molLinkNodes. Without this fix, XQMol output would lose link node
extensions in the SMILES.

Copy properties from the original query molecule to all enumerated
tautomers before constructing the TautomerQuery. This preserves extended
SMILES data without impacting enumeration performance.

* MolStandardize: use parallel iteration and cache bond lookups

Replace O(n) getAtomWithIdx/getBondWithIdx calls with parallel iteration
over atom/bond ranges in canonicalizeInPlace and enumerate. Cache bond
lookups in setTautomerStereoAndIsoHs to avoid repeated O(n) searches.

* perf: add specialized matchers for simple tautomer scoring patterns

Replace VF2 graph matching with O(n) loops for 6 simple patterns:
- countDoubleOrAromaticBonds: C=O, N=O, P=O patterns
- countMethyls: [CX4H3] methyl groups
- countCarbonDoubleHetero: [C]=[/home/dcvuser/rdkit;Code/GraphMol/MolStandardize/Tautomer.h] aliphatic C=hetero
- countAromaticCarbonExocyclicN: [c]=aromatic C=exocyclic N
Complex patterns (benzoquinone, oxim, guanidine, aci-nitro) still use VF2.
Combined with the pre-filtering optimization, this achieves ~3.7x speedup
(~2500ms vs ~9300ms original) for tautomer canonicalization.

* Fix tautomer canonicalize dropping conformers from quickCopy

quickCopy (RWMol(*mol, true)) skips conformers, so tautomer
enumeration products lose 2D/3D coordinates. This causes InChI
generation to omit the /b (double bond E/Z stereo) layer, since
E/Z is derived from atomic coordinates.

Fix: copy conformers from the original molecule onto the canonical
tautomer after pickCanonical in TautomerEnumerator::canonicalize().

Tests: SMILES-based E/Z check in testTautomer.cpp, molblock-based
conformer preservation check in catch_tests.cpp.

* add test on canonicalize losing stereo

* add regression test for exocyclic C=C tautomer canonicalization

The getTautomerStateKey() pre-filter (commit 2595ef748) can falsely
deduplicate distinct tautomers when their atom-index-ordered state
patterns happen to match, leading canonicalize() to pick the wrong
canonical form for molecules with STEREOTRANS-pinned exocyclic C=C
bonds after RemoveHs.

Test verifies that O=C(CC1=CC2=CC=COC2)NC1=O canonicalizes to the
exocyclic form O=C1CC(=CC2=CC=COC2)C(=O)N1, not the endocyclic form
O=C1C=C(C=C2CC=COC2)C(=O)N1.

Currently expected to FAIL until the state key dedup bug is fixed.

* MolStandardize: expand tautomer connectivity SMARTS

* MolStandardize: scope tautomer pattern enum

* MolStandardize: trim tautomer pattern enum

* MolStandardize: use symmetric ring scoring
2026-03-31 06:42:40 +02:00
Yakov Pechersky
0986d22c58 Deterministic kekulize, independent of atom and bond order (#9125)
* Make kekulization deterministic

* Add tautomer order-independence regression (python)

* Adjust tautomer tests for deterministic kekulization

* Update graphmol wedged-bond kekulization checks

* SmilesParse: update aromatic bond index expectations

* SmilesParse: refresh cxsmilesTest expected files

* Depictor: update testDepictor expected MolBlocks

* Depictor: update depictorCatch expectations

* Depictor Wrap: update expected MolBlock for pyDepictor

* MarvinParse: update testMrvToMol expected outputs

* FileParsers: refresh testAtropisomers expected outputs

* FileParsers: update tests for deterministic kekulization

* MolDraw2D: refresh brittle bond assertions

* RascalMCES: update expected cluster size

* MinimalLib: make cffi wedging check order-independent

* documentation fix

* MinimalLib: update Kekulé bond table in aligned-coords test

* Hoist duplicated lambdas to TEST_CASE scope

* Remove unused originalWedges variable

* Remove redundant bounds check; clarify wedge-end preference

* Pre-sort allAtms by wedge-end + rank

* Use mol.atomNeighbors() for neighbor iteration

* Check inAllAtms before linear-scanning done

* Drop redundant optsV/wedgedOptsV sorts

* Remove unused Canon.h include

* Add canonical parameter to Kekulize; skip ranking during sanitization

* Test canonical re-kekulization preserves stereo across atom orderings

* MinimalLib: update Kekulé bond orders in invertedWedges

* Change Kekulize canonical default to false, expose in Python wrappers

* keep rank order, push_back

* Revert "RascalMCES: update expected cluster size"

This reverts commit a81bb39495.

* docstring change

* expose new flag to python wrapper

* document changes in ReleaseNotes.md

* revert minimallib test changes again

* canonical = true defaults

* Revert "revert minimallib test changes again"

This reverts commit 039e1d84da.

* Reapply "RascalMCES: update expected cluster size"

This reverts commit 7b83a7a3e8.

---------

Co-authored-by: greg landrum <greg.landrum@gmail.com>
2026-03-19 08:43:13 +01:00
Yakov Pechersky
67b73acba4 when shifting double bonds in tautomerization, set double bond stereo to STEREOANY (#9119)
* when shifting double bonds in tautomerization, set double bond stereo to STEREOANY

fixes #9102

notably, do this only to non-ring bonds
move tests over to assert this
avoid index-based bond lookup in test assertions
since bond indexing can move in tautomers

* inchi unittest check

* fast rings
2026-02-19 19:29:17 +01:00
Brian Kelley
9495dd5413 Expose tautomer scoring functions to python (#7994)
* Expose tautomer scoring functions to python

* Add more tests/documentation

* Rename getDefaultTautomerSubstructs to getDefaultTautomerScoreSubstructs

* Remove ROMOL_SPTR

* Add full custom scoring function example

* Run clang format

* Use proper BOOST_PYTHON_FUNCTION_OVERLOADS

* Use default copy constructor
2024-11-15 05:37:35 +01:00
tadhurst-cdd
d5d4d194ec atropisomer handling added (#6903)
* atropisomer handling added

* fixed non-used variables,  linking directives

* BOOST LIB start/stop fixes, linking fix

* Fixes for RDKIT CI errors

* minimalLib fix

* changed vector<enum> for java builds

* check for extra chars in CIP labeling

* removed wrong deprecated message

* fix ostrstream output error?

* restored _ChiralAtomRank to lowercase first letter

* changes for merged master

* Fixed catch label for new Catch package

* update expected psql results

* get swig wrappers building

* restore MolFileStereochem to FileParsers

* fix java wrapper for reapplyMolBlockWedging

* some suggestions

* move a couple functions out of Bond

* Merge branch 'master' into pr/atropisomers2

* merged master

* Renamed setStereoanyFromSquiggleBond

* atropisomers in cdxml, rationalize atrop wedging, stereoGroups in drawMol

* fix for CI build

* attempt to fix java build in CI

* attempt to fix java build in CI #2

* New routine to remove non-explicit  3D-geneated chirality

* changed to use pair for atrop atoms and related bonds

* Changes as per PR reviews

* PR review respnses

* PR review reponse - more

* Fix merge from master

* fixing java ci after merge

* Updated the help doc for atripisomers

* update the atropisomer docs

* improve the images

* add the source CXSMILES

---------

Co-authored-by: greg landrum <greg.landrum@gmail.com>
2023-12-22 04:58:18 +01:00
Greg Landrum
c7c9ad3328 Add in place and multithread support for more of the MolStandardize code (#6970) 2023-12-12 17:21:18 +01:00
Ric
2bd6481280 Fix some build warnings (#6618)
* fixes a copy constructor issue:

In copy constructor ‘boost::shared_ptr<T>::shared_ptr(const boost::shared_ptr<T>&) [with T = RDKit::ROMol]’,
    inlined from ‘PyObject* RDKit::RunReactant(ChemicalReaction*, T, unsigned int) [with T = boost::python::api::object]’ at /tmp/rdkit_builder/rdkit/Code/GraphMol/ChemReactions/Wrap/rdChemReactions.cpp:129:14:
/usr/include/boost/smart_ptr/shared_ptr.hpp:416:66: warning: ‘*(const boost::shared_ptr<RDKit::ROMol>*)((char*)&<unnamed> + offsetof(boost::python::extract<boost::shared_ptr<RDKit::ROMol> >,boost::python::extract<boost::shared_ptr<RDKit::ROMol> >::<unnamed>.boost::python::converter::extract_rvalue<boost::shared_ptr<RDKit::ROMol> >::m_data.boost::python::converter::rvalue_from_python_data<boost::shared_ptr<RDKit::ROMol> >::<unnamed>.boost::python::converter::rvalue_from_python_storage<boost::shared_ptr<RDKit::ROMol> >::storage)).boost::shared_ptr<RDKit::ROMol>::px’ may be used uninitialized [-Wmaybe-uninitialized]
  416 |     shared_ptr( shared_ptr const & r ) BOOST_SP_NOEXCEPT : px( r.px ), pn( r.pn )
      |                                                                ~~^~
/tmp/rdkit_builder/rdkit/Code/GraphMol/ChemReactions/Wrap/rdChemReactions.cpp: In function ‘PyObject* RDKit::RunReactant(ChemicalReaction*, T, unsigned int) [with T = boost::python::api::object]’:
/tmp/rdkit_builder/rdkit/Code/GraphMol/ChemReactions/Wrap/rdChemReactions.cpp:129:30: note: ‘<anonymous>’ declared here
  129 |   ROMOL_SPTR react = python::extract<ROMOL_SPTR>(reactant);
      |                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/boost/smart_ptr/shared_ptr.hpp:17:
In copy constructor ‘boost::detail::shared_count::shared_count(const boost::detail::shared_count&)’,
    inlined from ‘boost::shared_ptr<T>::shared_ptr(const boost::shared_ptr<T>&) [with T = RDKit::ROMol]’ at /usr/include/boost/smart_ptr/shared_ptr.hpp:416:72,
    inlined from ‘PyObject* RDKit::RunReactant(ChemicalReaction*, T, unsigned int) [with T = boost::python::api::object]’ at /tmp/rdkit_builder/rdkit/Code/GraphMol/ChemReactions/Wrap/rdChemReactions.cpp:129:14:
/usr/include/boost/smart_ptr/detail/shared_count.hpp:438:67: warning: ‘((const boost::detail::shared_count*)((char*)&<unnamed> + offsetof(boost::python::extract<boost::shared_ptr<RDKit::ROMol> >,boost::python::extract<boost::shared_ptr<RDKit::ROMol> >::<unnamed>.boost::python::converter::extract_rvalue<boost::shared_ptr<RDKit::ROMol> >::m_data.boost::python::converter::rvalue_from_python_data<boost::shared_ptr<RDKit::ROMol> >::<unnamed>.boost::python::converter::rvalue_from_python_storage<boost::shared_ptr<RDKit::ROMol> >::storage)))[1].boost::detail::shared_count::pi_’ may be used uninitialized [-Wmaybe-uninitialized]
  438 |     shared_count(shared_count const & r) BOOST_SP_NOEXCEPT: pi_(r.pi_)
      |                                                                 ~~^~~
/tmp/rdkit_builder/rdkit/Code/GraphMol/ChemReactions/Wrap/rdChemReactions.cpp: In function ‘PyObject* RDKit::RunReactant(ChemicalReaction*, T, unsigned int) [with T = boost::python::api::object]’:
/tmp/rdkit_builder/rdkit/Code/GraphMol/ChemReactions/Wrap/rdChemReactions.cpp:129:30: note: ‘<anonymous>’ declared here
  129 |   ROMOL_SPTR react = python::extract<ROMOL_SPTR>(reactant);
      |                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* Fixes a (weird) allocation warning in GCC 12

In file included from /usr/include/c++/12/bits/alloc_traits.h:33,
                 from /usr/include/c++/12/ext/alloc_traits.h:34,
                 from /usr/include/c++/12/bits/basic_string.h:39,
                 from /usr/include/c++/12/string:53,
                 from /usr/include/c++/12/bits/locale_classes.h:40,
                 from /usr/include/c++/12/bits/ios_base.h:41,
                 from /usr/include/c++/12/streambuf:41,
                 from /usr/include/c++/12/bits/streambuf_iterator.h:35,
                 from /usr/include/c++/12/iterator:66,
                 from /usr/include/boost/iterator/iterator_traits.hpp:10,
                 from /usr/include/boost/range/iterator_range_core.hpp:26,
                 from /usr/include/boost/range/iterator_range.hpp:13,
                 from /usr/include/boost/iostreams/traits.hpp:38,
                 from /usr/include/boost/iostreams/detail/call_traits.hpp:15,
                 from /usr/include/boost/iostreams/detail/adapter/device_adapter.hpp:22,
                 from /usr/include/boost/iostreams/tee.hpp:18,
                 from /tmp/rdkit_builder/rdkit/Code/RDGeneral/RDLog.h:17,
                 from /tmp/rdkit_builder/rdkit/Code/GraphMol/ChemTransforms/testChemTransforms.cpp:11:
In function ‘void std::_Construct(_Tp*, _Args&& ...) [with _Tp = pair<int, int>; _Args = {const pair<int, int>&}]’,
    inlined from ‘_ForwardIterator std::__do_uninit_copy(_InputIterator, _InputIterator, _ForwardIterator) [with _InputIterator = const pair<int, int>*; _ForwardIterator = pair<int, int>*]’ at /usr/include/c++/12/bits/stl_uninitialized.h:120:21,
    inlined from ‘static _ForwardIterator std::__uninitialized_copy<_TrivialValueTypes>::__uninit_copy(_InputIterator, _InputIterator, _ForwardIterator) [with _InputIterator = const std::pair<int, int>*; _ForwardIterator = std::pair<int, int>*; bool _TrivialValueTypes = false]’ at /usr/include/c++/12/bits/stl_uninitialized.h:137:32,
    inlined from ‘_ForwardIterator std::uninitialized_copy(_InputIterator, _InputIterator, _ForwardIterator) [with _InputIterator = const pair<int, int>*; _ForwardIterator = pair<int, int>*]’ at /usr/include/c++/12/bits/stl_uninitialized.h:185:15,
    inlined from ‘_ForwardIterator std::__uninitialized_copy_a(_InputIterator, _InputIterator, _ForwardIterator, allocator<_Tp>&) [with _InputIterator = const pair<int, int>*; _ForwardIterator = pair<int, int>*; _Tp = pair<int, int>]’ at /usr/include/c++/12/bits/stl_uninitialized.h:372:37,
    inlined from ‘void std::vector<_Tp, _Alloc>::_M_range_insert(iterator, _ForwardIterator, _ForwardIterator, std::forward_iterator_tag) [with _ForwardIterator = const std::pair<int, int>*; _Tp = std::pair<int, int>; _Alloc = std::allocator<std::pair<int, int> >]’ at /usr/include/c++/12/bits/vector.tcc:808:38,
    inlined from ‘std::vector<_Tp, _Alloc>::iterator std::vector<_Tp, _Alloc>::insert(const_iterator, std::initializer_list<_Tp>) [with _Tp = std::pair<int, int>; _Alloc = std::allocator<std::pair<int, int> >]’ at /usr/include/c++/12/bits/stl_vector.h:1409:17,
    inlined from ‘void testReplaceCoreMatchVectMultipleMappingToCore()’ at /tmp/rdkit_builder/rdkit/Code/GraphMol/ChemTransforms/testChemTransforms.cpp:974:17:
/usr/include/c++/12/bits/stl_construct.h:119:7: warning: ‘void* __builtin_memcpy(void*, const void*, long unsigned int)’ writing 56 bytes into a region of size 48 overflows the destination [-Wstringop-overflow=]
  119 |       ::new((void*)__p) _Tp(std::forward<_Args>(__args)...);
      |       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/x86_64-linux-gnu/c++/12/bits/c++allocator.h:33,
                 from /usr/include/c++/12/bits/allocator.h:46,
                 from /usr/include/c++/12/string:41:
In member function ‘_Tp* std::__new_allocator<_Tp>::allocate(size_type, const void*) [with _Tp = std::pair<int, int>]’,
    inlined from ‘static _Tp* std::allocator_traits<std::allocator<_CharT> >::allocate(allocator_type&, size_type) [with _Tp = std::pair<int, int>]’ at /usr/include/c++/12/bits/alloc_traits.h:464:28,
    inlined from ‘std::_Vector_base<_Tp, _Alloc>::pointer std::_Vector_base<_Tp, _Alloc>::_M_allocate(std::size_t) [with _Tp = std::pair<int, int>; _Alloc = std::allocator<std::pair<int, int> >]’ at /usr/include/c++/12/bits/stl_vector.h:378:33,
    inlined from ‘void std::vector<_Tp, _Alloc>::_M_range_insert(iterator, _ForwardIterator, _ForwardIterator, std::forward_iterator_tag) [with _ForwardIterator = const std::pair<int, int>*; _Tp = std::pair<int, int>; _Alloc = std::allocator<std::pair<int, int> >]’ at /usr/include/c++/12/bits/vector.tcc:799:40,
    inlined from ‘std::vector<_Tp, _Alloc>::iterator std::vector<_Tp, _Alloc>::insert(const_iterator, std::initializer_list<_Tp>) [with _Tp = std::pair<int, int>; _Alloc = std::allocator<std::pair<int, int> >]’ at /usr/include/c++/12/bits/stl_vector.h:1409:17,
    inlined from ‘void testReplaceCoreMatchVectMultipleMappingToCore()’ at /tmp/rdkit_builder/rdkit/Code/GraphMol/ChemTransforms/testChemTransforms.cpp:974:17:
/usr/include/c++/12/bits/new_allocator.h:137:55: note: at offset [8, 56] into destination object of size 56 allocated by ‘operator new’
  137 |         return static_cast<_Tp*>(_GLIBCXX_OPERATOR_NEW(__n * sizeof(_Tp)));
      |                                                       ^

* finalForce maybe uninitialized in inlined copy constructor

* failure is being used uninitialized

* MaeWriter::write overrides a method of the base class without being marked 'override'

This one is annoying because it happens in an exported header, so it propagates
to any code using the header!

* otq is never used

* fix implicitly declared assignment operator warning

* variables in catch statements that are never used

* fix type (sign) mismatch warning

* drop duplicate export macro
2023-08-11 06:04:55 +02:00
Greg Landrum
ff6451447a Fixes #5784 (#5817)
catch kekulization errors during the tautomer enumeration
I have tested this on ~100K ChEMBL molecules and encountered
no further problems.
2022-12-01 17:11:50 +01:00
Greg Landrum
522811b8d4 Fixes #5402 (#5542)
* support transforms with branches

* improve output when doing verbose canonicalization

* Fixes #5402
2022-09-09 05:06:42 +02:00
Greg Landrum
fb49a33b5a Fix a couple of problems with MolStandardize (#5319)
* Fixes #5317

* Fixes #5318

* Fixes #5320

* Update Code/GraphMol/MolStandardize/Charge.cpp

Co-authored-by: Paolo Tosco <paolo.tosco.mail@gmail.com>

Co-authored-by: Paolo Tosco <paolo.tosco.mail@gmail.com>
2022-05-30 06:00:09 +02:00
Greg Landrum
1431c48d50 Fixes #4700: changes default tautomer enumeration rules (#4727)
* backup

* all tautomer defs which involve z or R{} queries in the original paper have been updated
passes all tests

* add support for the original version of the parameters

* fix win dll builds
2021-11-27 05:02:39 +01:00
Greg Landrum
f5a54af475 A collection of MolStandardize improvements (#4118)
* Swap to using a data structure for default normalization parameters

* bring the default fragment data into the code too

* cleanup

* add reionizer parameters via data

change fragment parse failures to ValueErrorExceptions

* tautomer parameters in the code

* got a little over-enthusiastic in that last cleanup

* use boost::flyweight to cache normalization and charge data params

* a bit more cleanup

* support reading params from JSON

* fragments from JSON
single-call for fragment removal

* add a one-liner for the canonical tautomer

* quick refactor

* Fixes #4115

* complete the parents

* docs

* move the definitions to a namespace and make them const

* see if switching to c++14 fixes the CI compile problems with g++ 5.5

* somewhat uglier way of solving the initalizer list problem
2021-05-19 09:11:23 +02:00
Paolo Tosco
6373d78744 Fixes #3755 (#3758)
* - fixed VERBOSE_ENUMERATION build
- code cleanup

* - fixes #3755

* changes in response to review
2021-01-27 08:20:25 +01:00
Paolo Tosco
6053f62453 Added support for isotopic Hs to TautomerEnumerator (#3502)
* - added support for isotopic Hs to TautomerEnumerator

* review suggestions

Co-authored-by: greg landrum <greg.landrum@gmail.com>
2020-10-19 13:42:51 +02:00
Paolo Tosco
7d0d7df5f0 Fixes a number of issues flagged by clang (#3498)
* - fixes a number of issues flagged by clang

* - removed commented line
2020-10-15 15:03:34 +02:00
Paolo Tosco
f6e5d7a823 fixes #3430 (#3436) 2020-09-26 05:24:44 +02:00
Paolo Tosco
527a7adf99 Some work on TautomerEnumerator (#3327)
* - Added a TautomerEnumerator constructor which allows passing CleanupParameters
- Added three configurable parameters to CleanupParameters
- Added a callback to TautomerEnumerator
- Fixed a bug where the same tautomer could be mapped by both isomeric and non-isomeric SMILES
- TautomerEnumerator::enumerate() now returns a TautomerEnumeratorResult and does not take
  dynamic_bitset pointers as optional parameters
- Added a missing transform from the Sitzmann paper
- General code cleanup and optimization

* - TautomerEnumeratorResult is now iterable in both C++ and Python
- further optimizations
- implemented a TautomerEnumerator.PickCanonical() Python wrapper
- added C++ and Python accessors to SMILES and SmilesTautomerMap

* - make sure the number of tautomers reported by rdLogger is correct and definitive

* make sure that if N maxTautomers are requested, N tautomers are returned if the theoretical number of tautomers is M>N

* avoid that sulfonic acids hit the formamidinesulfinic acid tautomerisation rule

* offer an option to allow the old API to still be used.

* Changes in response to review and following discussion with Gareth and Greg

* - made TautomerEnumeratorResult an enum class (was a plain C enum)
- made TautomerEnumeratorResult::const_iterator a bidirectional_iterator
- added tests to fully probe the TautomerEnumeratorResult::const_iterator functionality

* - change the difference_type definition
- added tests for the above

* - cosmetic change to improve code readability

Co-authored-by: greg landrum <greg.landrum@gmail.com>
2020-09-24 17:00:03 -04:00
jones-gareth
21a8a263bd Tautomer search (#3205)
* TautomerQuery class

* working test

* Comment header

* Merge with master. Greg's suggestions. More tests. Python wrapper

* Updated Pattern Fingerprints to merge with master. Reset email

* Java/C# wrappers. Java test

* Java/C# wrappers. Java test

* Java/C# wrappers. Java test

* Greg suggestions of 6_2_2020

* Explicit types in Java TautomerQueryTests class

* Update Code/GraphMol/QueryOps.h

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

* get windows dll builds working

* Removed tautomer query wrappers from RDKit namespace

* Fixes from evaluation

* Template molecule identification fix. Greg's suggestion

* Final check search functor for evaluating template matches as they are found

Co-authored-by: Gareth Jones <gjones@glysade.com>
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
2020-06-24 17:27:40 +02:00
Greg Landrum
edd922c99c Cleanup warnings from clang-10 (#3238)
* stop returning local memory in exceptions

* remove a couple unnecessary copies in loops

* fix a bug in the way the default MMFF aromatic parameters are constructed

* remove a bunch of loop-variable warnings

* remove a bunch of clang warnings

* disable clang warnings in python wrappers

* remove some warnings when building the python wrappers
2020-06-19 17:16:22 -04:00
Greg Landrum
c7e7d88940 Fixes #2990 (#3016)
* add the test

* Fixes #2990

Needs more testing!

* additional testing
2020-03-22 04:58:01 +01:00
Greg Landrum
ace523b1b5 allow retrieval of the atoms/bonds modified by the tautomerization (#3013)
* add modifiedBonds and modifiedAtoms to C++ API

* add modifiedAtoms/modifiedBonds interface to Python too

* update documentation
2020-03-17 14:13:54 +01:00
Greg Landrum
6f9ba35826 Tune the tautomer scoring (#2959)
* tautomer scoring tweaks
doc updates
expose tautomer score to Python

* fix leaks in tests
2020-02-19 07:38:04 -05:00
Greg Landrum
915471a079 Fix a problem with aromatic heteroatom tautomer enumeration (#2952)
* update transforms to enforce neutral nitrogens when doing the aromatic neteroatom transforms

* formatting, add a test

* remove unused #include
2020-02-13 06:35:01 +01:00
Greg Landrum
d41752d558 run clang-tidy with readability-braces-around-statements (#2899)
* run clang-tidy with readability-braces-around-statements
clang-format the results
clean up all the parts that clang-tidy-8 broke

* fix problem on windows
2020-01-25 14:19:32 +01:00
Greg Landrum
f8a4020789 Add MolVS tautomer canonicalization (#2886)
* first pass at implementing molvs-style tautomer scoring
This isn't optimal in terms of performance, but all the MolVS tests pass.

* clang format

* A bit of refactoring of the tautomer stuff

* first pass at python wrappers

* allow specifying the tautomer scoring function from C++

* EFF: use boost::flyweight so SMARTS is only parsed once

* improve the python API

* switch to boost::function instead of using function pointers

* allow user-provided tautomer scoring functions

* documentation and scorer version

* change in response to review
2020-01-17 15:25:17 -05:00
Eisuke Kawashima
5cd27a242f Fix typo (#2862)
* Fix typo

* Reflect the comments

* Fix more typos
2019-12-31 06:43:27 +01:00
Ric
91008ff11d Address compile warnings & trivial improvements (#2097)
* Address compile warnings & trivial improvements

* revert unwanted initializers; use RDUNUSED_PARAM for unused params

* revert fix in testRDFcustom; marked with 'TO DO' comment
2018-10-12 06:39:32 -04:00
Susan Leung
956fdf268c Dev/GSOC2018_MolVS_Integration (#2002)
* short test file for MolVS standardize_sm

* short test file for MolVS fragment

* short test file for MolVS metals

* short test file for MolVS normalize

* short test file for MolVS reionize

* short test file for MolVS tautomer

* short test file for MolVS validate

* long test file for MolVS standardize smiles

* long test file for MolVS fragment

* long test file for MolVS metals

* long test file for MolVS normalize

* long test file for MolVS reionize

* long test file for MolVS tautomer

* long test file for MolVS validate

* Unit tests for MolVS steps

* dropping support for Python2

* molvs/__init__.py

* molvs/charge.py

* molvs/errors.py

* molvs/fragment.py

* molvs/metal.py

* molvs/normalize.py

* molvs/resonance.py

* molvs/standardize.py

* molvs/tautomer.py

* molvs/utils.py

* molvs/validate.py

* molvs/validations.py

* molvs/cli.py

* adapted and renamed molvs/cli.py to work within $RDBASE/Contrib/MolVS/

* setup MolStandardize directories, source with empty cleanup function, header, CMake files

* corrections to empty source, header and test1.cpp

* adding empty functions and initializers to MolStandardize

* empty Metal source, header and added test

* added most of Metal.cpp functionality and made some more tests

* empty functions and initializers to Normalize

* empty functions and initializers to Validate

* added most code for RDKitDefault mode, along with some tests

* restructure for abstract base class ValidateMethod

* written in isNoneValidation for MolVSValidation

* took out isNoneValidation, put in noAtomValidation, neutralValidation, isotopeValidation for MolVSValidation

* added in AllowedAtoms

* added in disallowedAtoms

* corrections to Validate

* added code for FragmentRemover

* extended fragment functionality to include choose largest fragment, added in tests for fragment catalog, fragment remover. Also added fragmentValidation method in MolStandardize

* added another test to testValidate test_fragment

* corrections to fragment

* corrections to Metal

* added code for Normalize

* added normalize member function to MolStandardize and added tests

* added multi fragment functionality to Normalize.cpp and additional tests

* TransformCatalog

* tests for Normalize.cpp

* first bit of cleanup

* added most of Charge functionality and some tests

* some corrections to Charge.cpp and some more tests to testCharge.cpp

* corrections to Charge.cpp

* start of Tautomer Enumerate with some tests

* added BondType option to Tautomer Enumeration

* correcting for some memory leakage

* a few alterations to formatting

* sorting out some memory leaks

* sorting out some memory leaks

* some corrections for PCS test set

* redo tests with updated RDKit

* fixing memory leak

* more fixes after 100kPCS set testing

* using tab as delimiter in CSVs rather than comma

* tutorial for MolStandardize

* still working on Tautomer enumeration

* deleted some empty tests

* starting writing tautomer canonicalize

* rename test_data -> data (the source still needs to be updated)

* automatic source reformatting

* adjust to directory rename

* move the fragment catalog test into the MolStandardize directory
do not create separate library for FragmentCatalog

* stop building separate libraries for the catalogs

* move the CleanupParameters into the MolStandardize namespace

* first pass at python wrapper

* move the py module to the correct dir;
add some python tests;
add standardizeSmiles to python wrapper

* disabling the compareMolVSTest since that requires command line arguments to run

* get this building on windows

* put the python lib in the right place

* further work on python wrapper for rdMolStandardize

* added get and set functions to Metal and wrapped them

* added get and set functions to Metal and wrapped them

* changed construstor of Reionizer class and input args for reionize, wrapped this default

* overload Reionizer constructor so user can input own AcidBaseFile from python

* added Uncharger class to Charge and added test for Uncharger

* wrapped Fragment, fixed some memory leakage, changed some args and return types, added some tests

* wrapped Normalized and changed how Normalizer class is initiated

* changing MolVSValidation structure so user can choose which MolVS submethod they want

* starting to write Wrap for Validate

* now it compiles with Wrap/Validate.cpp

* a couple refactorings around validate

* move the validate code into the rdMolStandardize module

* make sure a valid pointer is returned for standardizeSmiles

* rdMolStandardize.MolVSValidation done and tests added

* half way through AllowedAtomsValidation

* finished AllowedAtomsValidation and DisallowedAtomsValidation

* moved charge, fragment, metal, normalize into the rdMolStandardize module

* changed tutorial to use wrapped code

* added copyrights

* added copyrights

* move the data files

* modify source files to adjust to the move

* added validateSmiles functionality

* removed std::cout

* redid some of the 100k PCS tests

* working on the tutorial

* adding some documentation

* deleting some comment lines

* some changes after pull review

* More changes after pull review

* start of trying to make java wrap

* remove some warnings, add some questions

* additional warning removals, a bit more reporting

* some test cleanups

* enable testing of the java code
2018-09-28 11:24:25 +02:00