* Speed up tautomer canonicalization by deferring on SSSR calc
* Lazy kekulization for tautomer enumeration
Defer kekulization of tautomers until they are actually needed for
transform matching. This avoids creating kekulized copies for:
1. The initial tautomer (until first iteration)
2. New tautomers that may never be processed (if enumeration ends early)
The Tautomer class now supports lazy initialization of the kekulized
form via getKekulized() method.
Performance improvement: ~7% additional speedup (total ~22-24% from baseline)
* Use count-only substructure matching in tautomer scoring
* Add SubstructMatchCount regression test
* MolStandardize: reduce enumerate overhead
* MolStandardize: avoid per-tautomer ring recomputation
* Atom: cache PeriodicTable pointer in valence calcs
* Atom: reuse PeriodicTable in getEffectiveAtomicNum
* PeriodicTable: add atomic fast path for getTable
* GraphMol: reduce ROMol copy reallocations
* MolStandardize: use quickCopy for per-match product copies
Use RWMol(*kmol, true) in tautomer enumeration to avoid copying properties/bookmarks/conformers for each candidate. This reduces deep-copy overhead without changing chemistry.
* MolStandardize: pre-filter scoring patterns by element/connectivity
For tautomer scoring, pre-compute which SubstructTerms are relevant for
a given input molecule. Since tautomerization only moves H atoms and
changes bond orders (never creates/destroys heavy-atom bonds), patterns
requiring missing elements or connectivity can be skipped for all
tautomers of that molecule.
Two-stage filtering:
1. Element check: skip patterns requiring atoms not in the molecule
2. Connectivity check: skip patterns whose bond-order-agnostic structure
doesn't match the input molecule's connectivity
This reduces the number of VF2 substructure calls per tautomer from 12
to typically 3-5, depending on the molecule's composition.
* MolStandardize: preserve molecule properties for canonical tautomer
Copy molecule properties from the original input to the canonical tautomer
result. Since quickCopy during enumeration skips d_props to avoid overhead,
extended SMILES data like link nodes (LN) was lost. This restores them
on the final result.
* TautomerQuery: preserve molecule properties (e.g. link nodes) in tautomers
TautomerQuery::fromMol() uses TautomerEnumerator::enumerate() which uses
quickCopy for performance. This doesn't copy molecule properties like
_molLinkNodes. Without this fix, XQMol output would lose link node
extensions in the SMILES.
Copy properties from the original query molecule to all enumerated
tautomers before constructing the TautomerQuery. This preserves extended
SMILES data without impacting enumeration performance.
* MolStandardize: use parallel iteration and cache bond lookups
Replace O(n) getAtomWithIdx/getBondWithIdx calls with parallel iteration
over atom/bond ranges in canonicalizeInPlace and enumerate. Cache bond
lookups in setTautomerStereoAndIsoHs to avoid repeated O(n) searches.
* perf: add specialized matchers for simple tautomer scoring patterns
Replace VF2 graph matching with O(n) loops for 6 simple patterns:
- countDoubleOrAromaticBonds: C=O, N=O, P=O patterns
- countMethyls: [CX4H3] methyl groups
- countCarbonDoubleHetero: [C]=[/home/dcvuser/rdkit;Code/GraphMol/MolStandardize/Tautomer.h] aliphatic C=hetero
- countAromaticCarbonExocyclicN: [c]=aromatic C=exocyclic N
Complex patterns (benzoquinone, oxim, guanidine, aci-nitro) still use VF2.
Combined with the pre-filtering optimization, this achieves ~3.7x speedup
(~2500ms vs ~9300ms original) for tautomer canonicalization.
* Fix tautomer canonicalize dropping conformers from quickCopy
quickCopy (RWMol(*mol, true)) skips conformers, so tautomer
enumeration products lose 2D/3D coordinates. This causes InChI
generation to omit the /b (double bond E/Z stereo) layer, since
E/Z is derived from atomic coordinates.
Fix: copy conformers from the original molecule onto the canonical
tautomer after pickCanonical in TautomerEnumerator::canonicalize().
Tests: SMILES-based E/Z check in testTautomer.cpp, molblock-based
conformer preservation check in catch_tests.cpp.
* add test on canonicalize losing stereo
* add regression test for exocyclic C=C tautomer canonicalization
The getTautomerStateKey() pre-filter (commit 2595ef748) can falsely
deduplicate distinct tautomers when their atom-index-ordered state
patterns happen to match, leading canonicalize() to pick the wrong
canonical form for molecules with STEREOTRANS-pinned exocyclic C=C
bonds after RemoveHs.
Test verifies that O=C(CC1=CC2=CC=COC2)NC1=O canonicalizes to the
exocyclic form O=C1CC(=CC2=CC=COC2)C(=O)N1, not the endocyclic form
O=C1C=C(C=C2CC=COC2)C(=O)N1.
Currently expected to FAIL until the state key dedup bug is fixed.
* MolStandardize: expand tautomer connectivity SMARTS
* MolStandardize: scope tautomer pattern enum
* MolStandardize: trim tautomer pattern enum
* MolStandardize: use symmetric ring scoring
* extra SSS match functions for atoms/bonds
initial implementation and testing
* add baseline to test
* add a functor for matching atom coords
* support the extra checks in python
* refactor the way the python callbacks are handled
* test tolerances
* expose the AtomCoordsMatcher to python
* allow the extra checks to override the default matching
---------
Co-authored-by: = <=>
* use std::span for substruct match callbacks
This removes a copy from every evaluation of potential matches
* some cleanup/modernization
* some modernization
* deprecate chiralAtomCompat
* small optimization
* remove naked pointers
* improve new_timings.py script
* changes suggested in review
* response to review
* response to review
* Fixes#8379
* check in some working tests
* test passes
* test passes
* test passes
* test passes
* test passes
* ensure that the invariants flush the streams on failure
* tests pass
* test passes
* tests pass
* tests pass
* tests pass
* tests pass
* tests pass
* tests pass
* tests pass
* tests pass
* tests pass
* tests pass
* tests pass
* tests pass
* tests pass
* tests pass
* Fixes#8391
* tests pass
* fix a test with legacy
not clear why this was not causing problems before
* make a test work
* Fixes#8396
* gcc builds work
* fingerprint tests pass
* mention backwards incompatible change
* fix a problem with FindMolChiralCenters
* more testing details
* enable the test status output
* Fixes#8432
fix a bug in double-bond stereo handling for template matching
* all depictor tests pass
* use the new-stereo chiral ranks in the depiction code
* always assign new-stereo chiral ranks
* make _ChiralAtomRank a computed property
This is analogous to _CIPRank
* tweak to the way the atom ordering is computed for 2D coordinate generation
* update two expected results
* backup
* response to review
* tests pass
* tests pass
---------
Co-authored-by: = <=>
* Properly cleanup Dict::Pair when serializing HasPropWithQueryValue
* Make sure pickling doesn't change original molecule
* Fix bad cut and paste
* Add PairHolder utility class for memory management of non Dict Dict::Pairs, fix mem leak in pickler
* Edit comment to force a rebuild
* Ignore PairHolder from Java/Swig builds
* Ignore PairHolder API from swig
* Reponses to review
* Add backward incompatible change
* Make release note a bullet point
* atropisomer handling added
* fixed non-used variables, linking directives
* BOOST LIB start/stop fixes, linking fix
* Fixes for RDKIT CI errors
* minimalLib fix
* changed vector<enum> for java builds
* check for extra chars in CIP labeling
* removed wrong deprecated message
* fix ostrstream output error?
* restored _ChiralAtomRank to lowercase first letter
* changes for merged master
* Fixed catch label for new Catch package
* update expected psql results
* get swig wrappers building
* restore MolFileStereochem to FileParsers
* fix java wrapper for reapplyMolBlockWedging
* some suggestions
* move a couple functions out of Bond
* Merge branch 'master' into pr/atropisomers2
* merged master
* Renamed setStereoanyFromSquiggleBond
* atropisomers in cdxml, rationalize atrop wedging, stereoGroups in drawMol
* fix for CI build
* attempt to fix java build in CI
* attempt to fix java build in CI #2
* New routine to remove non-explicit 3D-geneated chirality
* changed to use pair for atrop atoms and related bonds
* Changes as per PR reviews
* PR review respnses
* PR review reponse - more
* Fix merge from master
* fixing java ci after merge
* Updated the help doc for atripisomers
* update the atropisomer docs
* improve the images
* add the source CXSMILES
---------
Co-authored-by: greg landrum <greg.landrum@gmail.com>
* Fixes#6017
* a bit of cleanup work
* remove unused variable
* change in response to review
switch to using std::max(maxMatches,maxRecursiveMatches)
* test the case where maxSubstructMatches<maxMatches
* Set _SmilesStart when parsing SMARTS.
* SmartsWriter should also invert first atoms, like SMILES.
* Update test cases now these SMILES match themselves as SMARTS.
* rerun bison
* cleanup a possible repeated define
* When an atom moves from the first to second position winding should flip in SMARTS (i.e. same as SMILES).
---------
Co-authored-by: greg landrum <greg.landrum@gmail.com>
* backup
* basic tests pass
* add JSON out to substruct match parameters
* serialize the substruct match parameters in reactions
* add that to the python wrapper
* more testing
In high-symmetry cases where the symmetric SSSR does not find all
possible rings, substructure searches can fail because of this check.
Removing it fixes those cases, but is likely to decrease the performance
of substructure matching.
Also, adds a unit test where the old code fails
Co-authored-by: Franz Waibl <waiblfranz@gmail.com>
* revert duplicate chunk in release notes
* replace deprecated ifdefs
This one gets rid of USE_BUILTIN_POPCNT and RDK_THREADSAFE_SS
use RDK_OPTIMIZE_POPCNT or RDK_BUILD_THREADSAFE_SSS instead
* get rid of BUILD_COORDGEN_SUPPORT from ROMol.i
* fix a stupid typo
* update release notes
* suppress a bunch of warnings from third-party code
get rid of one warning in RDKit code
* corrections
* fix the maeparser flags
* remove some more inchi warnings with clang
* backup commit
This is mabye heading in the right direction and at least passes the basic tests which are there.
* some progress
* more tests and refactoring
* additional aliases
add carboaryl
* add CYC and ACY
* add ABC
* add AHC
* CBC and AOX
* add CHC and HAR
* add CXX
* cleanup: remove a bunch of nullptrs
* initial tagging support
* remove atom labels/sgroups after using them
* docs
* start handing writing
NOTE: this does not currently work: the generic code needs to move out of SubstructSearch
* move the generic groups to their own library
Signed-off-by: greg landrum <greg.landrum@gmail.com>
* make sure the generic groups end up in ctabs
* add forgotten CMakeLists.txt
* fix includes
* expose this stuff to Python
* CYC needs to initialize rings
* renaming
* add docs
* change in response to review
* - fix non-threaded *nix builds that currently fail because
boost flyweight introduces a dependency on pthreads
- make sure that mutexes and futures are only used when
RDK_BUILD_THREADSAFE_SSS is ON
- fix SubstructMatch failing test when RDK_BUILD_THREADSAFE_SSS is OFF
due to misplaced #ifdef's
- rename RDK_TEST_MULTITHREADED to RDK_THREADSAFE_SSS in inchi.cpp
(which is not a test)
* - the limitexternal Linux build is now single-threaded so we make
sure single-threaded builds do not break in the future
(suggestion from Greg)
* reverted unnecessary change to Code/GraphMol/FileParsers/testMultithreadedMolSupplier.cpp
Co-authored-by: Paolo Tosco <paolo.tosco@novartis.com>
* make sure that ResonanceMolSupplier subtructure matches are uniquified consistently
* Fixes github #4311 (#4312)
* a bit of simple refactoring
* Fixes#4311
- adds getValenceContrib() to QueryBond
- adds hasBondTypeQuery() and hasComplexBondTypeQuery() to QueryOps namespace
- atoms with complex bond type queries now have explict and implicit valences of 0
- adds tests for the above
* add a test
* Support using SubstructMatchParameters in RGD (#4318)
* support substructure search parameters in RGD.
Still needs testing/verification of the enhanced stereo stuff
* test enhanced stereo
* add support to python wrapper
unfortunately some python reformatting got mixed in there.
* addressed comments in review
Co-authored-by: Paolo Tosco <paolo.tosco@novartis.com>
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>