* Speed up tautomer canonicalization by deferring on SSSR calc
* Lazy kekulization for tautomer enumeration
Defer kekulization of tautomers until they are actually needed for
transform matching. This avoids creating kekulized copies for:
1. The initial tautomer (until first iteration)
2. New tautomers that may never be processed (if enumeration ends early)
The Tautomer class now supports lazy initialization of the kekulized
form via getKekulized() method.
Performance improvement: ~7% additional speedup (total ~22-24% from baseline)
* Use count-only substructure matching in tautomer scoring
* Add SubstructMatchCount regression test
* MolStandardize: reduce enumerate overhead
* MolStandardize: avoid per-tautomer ring recomputation
* Atom: cache PeriodicTable pointer in valence calcs
* Atom: reuse PeriodicTable in getEffectiveAtomicNum
* PeriodicTable: add atomic fast path for getTable
* GraphMol: reduce ROMol copy reallocations
* MolStandardize: use quickCopy for per-match product copies
Use RWMol(*kmol, true) in tautomer enumeration to avoid copying properties/bookmarks/conformers for each candidate. This reduces deep-copy overhead without changing chemistry.
* MolStandardize: pre-filter scoring patterns by element/connectivity
For tautomer scoring, pre-compute which SubstructTerms are relevant for
a given input molecule. Since tautomerization only moves H atoms and
changes bond orders (never creates/destroys heavy-atom bonds), patterns
requiring missing elements or connectivity can be skipped for all
tautomers of that molecule.
Two-stage filtering:
1. Element check: skip patterns requiring atoms not in the molecule
2. Connectivity check: skip patterns whose bond-order-agnostic structure
doesn't match the input molecule's connectivity
This reduces the number of VF2 substructure calls per tautomer from 12
to typically 3-5, depending on the molecule's composition.
* MolStandardize: preserve molecule properties for canonical tautomer
Copy molecule properties from the original input to the canonical tautomer
result. Since quickCopy during enumeration skips d_props to avoid overhead,
extended SMILES data like link nodes (LN) was lost. This restores them
on the final result.
* TautomerQuery: preserve molecule properties (e.g. link nodes) in tautomers
TautomerQuery::fromMol() uses TautomerEnumerator::enumerate() which uses
quickCopy for performance. This doesn't copy molecule properties like
_molLinkNodes. Without this fix, XQMol output would lose link node
extensions in the SMILES.
Copy properties from the original query molecule to all enumerated
tautomers before constructing the TautomerQuery. This preserves extended
SMILES data without impacting enumeration performance.
* MolStandardize: use parallel iteration and cache bond lookups
Replace O(n) getAtomWithIdx/getBondWithIdx calls with parallel iteration
over atom/bond ranges in canonicalizeInPlace and enumerate. Cache bond
lookups in setTautomerStereoAndIsoHs to avoid repeated O(n) searches.
* perf: add specialized matchers for simple tautomer scoring patterns
Replace VF2 graph matching with O(n) loops for 6 simple patterns:
- countDoubleOrAromaticBonds: C=O, N=O, P=O patterns
- countMethyls: [CX4H3] methyl groups
- countCarbonDoubleHetero: [C]=[/home/dcvuser/rdkit;Code/GraphMol/MolStandardize/Tautomer.h] aliphatic C=hetero
- countAromaticCarbonExocyclicN: [c]=aromatic C=exocyclic N
Complex patterns (benzoquinone, oxim, guanidine, aci-nitro) still use VF2.
Combined with the pre-filtering optimization, this achieves ~3.7x speedup
(~2500ms vs ~9300ms original) for tautomer canonicalization.
* Fix tautomer canonicalize dropping conformers from quickCopy
quickCopy (RWMol(*mol, true)) skips conformers, so tautomer
enumeration products lose 2D/3D coordinates. This causes InChI
generation to omit the /b (double bond E/Z stereo) layer, since
E/Z is derived from atomic coordinates.
Fix: copy conformers from the original molecule onto the canonical
tautomer after pickCanonical in TautomerEnumerator::canonicalize().
Tests: SMILES-based E/Z check in testTautomer.cpp, molblock-based
conformer preservation check in catch_tests.cpp.
* add test on canonicalize losing stereo
* add regression test for exocyclic C=C tautomer canonicalization
The getTautomerStateKey() pre-filter (commit 2595ef748) can falsely
deduplicate distinct tautomers when their atom-index-ordered state
patterns happen to match, leading canonicalize() to pick the wrong
canonical form for molecules with STEREOTRANS-pinned exocyclic C=C
bonds after RemoveHs.
Test verifies that O=C(CC1=CC2=CC=COC2)NC1=O canonicalizes to the
exocyclic form O=C1CC(=CC2=CC=COC2)C(=O)N1, not the endocyclic form
O=C1C=C(C=C2CC=COC2)C(=O)N1.
Currently expected to FAIL until the state key dedup bug is fixed.
* MolStandardize: expand tautomer connectivity SMARTS
* MolStandardize: scope tautomer pattern enum
* MolStandardize: trim tautomer pattern enum
* MolStandardize: use symmetric ring scoring
* Fixes#7873
* Resolve MonomerInfo class for deletion
* Add regression test for setMonomerInfo
---------
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* change valence model to use isolobal analogy
Remove support for five-coordinate C+ and, by analogy, five-coordinate N+2
Removes support for charge states that take atoms past the end of the periodic table
i.e. [Lv-4] is no longer supported
* update the tests for that
* remove valence state of 6 for Al
* fix representation of phosphate in the mol2 parser
this is a correction of what was done during #5973
* cleanup the exceptions for P, S, As, and Se
* drop valence states:
Si 6, P 7, As 7
* a couple of additional changes from #7397
* update java tests
* fix an inconsistency: Rb now supports valence -1
* documentation
* - replace operator[] with at() for bounds check
- extract some code into a function to avoid duplication
- use TAB as separator throughout in the periodic table data for consistency
* removing the .at() usage
We know that these vectors aren't empty, so there's no need for the bounds check.
---------
Co-authored-by: ptosco <paolo.tosco@novartis.com>
* simple modernization
* more
* done with RWMol for this pass
* the ROMol.cpp variant
* Atom
* minor change to bond
* simplify Conformer
* monomerinfo, queryatom, querybond
queryatom and querybond cpp files still need to be done
* typos
* revert a dumb change
* suggestion from review
* Refactor Atom.cpp to create a hasValenceViolation method that uses existing valence checking code
* work without exceptions
* get rid of the snake_case
* put free functions in an unnamed namespace
---------
Co-authored-by: greg landrum <greg.landrum@gmail.com>
* move definition of a couple global constants from a .h to a .cpp
* careful removal of some redundant atom PRECONDITIONS
* careful remove of some redundant ROMol PRECONDITIONS
a bit of additional cleanup
* optimization masquerading as modernization
* some more tidying
* a bit more atom cleanup
* change in response to review
In high-symmetry cases where the symmetric SSSR does not find all
possible rings, substructure searches can fail because of this check.
Removing it fixes those cases, but is likely to decrease the performance
of substructure matching.
Also, adds a unit test where the old code fails
Co-authored-by: Franz Waibl <waiblfranz@gmail.com>
* very basics: actually parsing the new atom stereochem features
* add some input verification for the chiral permutations
* fix a typo
add quadruple bond SMILES/SMARTS extension
* add forgotten files
* patch from Roger
* add Roger's parsing examples
* typo
* new tests
* adjusted version of next PR from Roger:
- add SP2D hybridization for square planar (this may change)
- some modernizationof Chirality.cpp
- stop using < HybridizationType in Chirality.cpp (should probably do this elsewhere too)
- improved handling of hybridization assignment for new stereochem
- handle new stereo/hybridization in UFF
- tests for the above
* perception of non-tetrahedral stereo from 3D (from Roger S)
Basic testing of SP and TB based on opensmiles docs
* potential fixes for octahedral assignment
more tests
* docs update
need way more!
* map the TH tags directly to @ tags
* very basics of SMILES writing
this does not work with anything that changes the permutation order
like canonicalization or writing things in rings.
* start to support the getChiralAcross API
* more testing
* consistency
* add hasNonTetrahedralStereo() and getIdealAngleBetweenLigands()
* assignStereochemistry should only remove non-tetrahedral stereo
* re-simplify those tests
* cleanup matrix stream output
* initial pass at supporting nontet stereo in distgeom
* backup
* start on the reference docs
* TBP reference
* first pass at Oh finished
* update SP section
* more doc updates
* fix a typo
* add param to not remove Hs connected to non-tetrahedral atoms
* VERY basic coord generation for square planar
* TBP basics
* basic OH depiction
* start testing missing ligands
allow non-tet stereo in rings (ugly, but correct)
* add new TBP functions from Roger
* update depiction code for new API
* backup, the new tests work so far
* Finish the TB tests
* OH tests pass too
* cleanup
* first pass at getting correct SMILES with reordering
need way more testing than this
* ensure permutation 0 is correctly preserved
* some progress towards adding non-tetrahedral stereo to StereoInfo
* doc update
* add non-tet chiral classes to python wrappers
* make sure removeAllHs also gets neighbors of non-tetrahedral centers
more testing
* a bit of depictor cleanup
* make the assignment from 3D more tolerant
more testing
* improve the bulk testing
* cleanup
* remove a bit of redundant code
* ensure we don't write bogus permutation values to SMILES
* fix some rebase problems
* allow assignStereochemistryFrom3D() to be called without sanitization
* allow disabling the non-tetrahedral stereo when it's not explicit
* get that working on windows too
* Move isEarlyAtom to a table format to reduce lock contention in getPeriodicTable
* Fix He early atom status
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* preliminary
* all tests pass
* cleanup
* more testing!
* we do still want to raise errors for aromatic atoms not in rings
fix one missed change for mol blocks
* update expected results for psql test
* add ROMol::atomBonds() and ROMol::atomNeighbors() methods
* remove some warnings
* start using the new code
* add default for those template params
* some more applications
* get the SWIG builds working
* get rid of extraneous ref
* remove extraneous comments
* a bit of simple refactoring
* Fixes#4311
- adds getValenceContrib() to QueryBond
- adds hasBondTypeQuery() and hasComplexBondTypeQuery() to QueryOps namespace
- atoms with complex bond type queries now have explict and implicit valences of 0
- adds tests for the above
* add a test
* first cleanup
* next round of changes. all tests pass
* Fixes#2909
* Fixes#2910
* further cleanup
* some cleanup/refactoring of the Dict class
* remove now extraneous calls to hasProp() before clearProp()
* minor refactoring of RDProps.h
* Switch from using our own version of round() to std::round()
* replace some boost::math stuff with the equivalents from std::
* cleanups in SmartsWrite
* refactor out a bunch of duplicated code
* fix an instance of undefined behavior
* changes in response to review
* run clang-tidy with readability-braces-around-statements
clang-format the results
clean up all the parts that clang-tidy-8 broke
* fix problem on windows
* add AtomValenceException
* refactor a bit and add KekulizeException
* add copy ctor and copy() method
* add detectChemistryProblems
* add getType() method
want to be able to get the type of the exception without requiring doing a bunch of dynamic casts
* first pass at exception inheritance/translation
needs some cleanup and expansion, but this does pass all tests.
* cleanup and finish the python wrappers for the new exceptions
* make sure things are truly polymorphic
* wrap shared_ptrs of the new exception types
* expose DetectChemistryProblems()
* get the java wrappers building again
* transfer those changes to the c# wrapper
* add detectChemistryProblems()
and deal with the fun fun exception inheritance things that ensue
* response to review
* first round of cleanups based on PVS-studio suggestions
* a couple more
* a few more cleanups
* another round of cleanups
* undo one of those cleanups
we want the integer rounding behavior here
* add a comment to make that clear
* Fix for filter catalog PRECONDITION redundancy
* backup
* Fixes#1387
this passes the bug tests, but needs the full tests run
* all tests pass
* remove some droppings left from an earlier attempt at a fix
* remove some additional printing
* cleanup
* Adds Atom atom map and rlabel apis
* Moves RLabels to their own namespace, adds other properties.
* Removes namespaces, liberally adds Atom to function names.
* move detail::computedPropName to RDKit::detail::computedPropName
* Adds RDAny (smaller generic holder) Updates all used dictionaries
This is an API compliant version of the current rdany system,
but uses a lot less memory in practice.
* Removes code duplication
* Converts CHECK_INVARIANT to TEST_ASSERT
* Fixes DoubleTag issue
* Adds Bool to DoubleMagic implementation
* Removes reference to property pickler
o rdkit gains a RDKit::common_properties namespace that contains common string value properties
o Dict.h and below gain getPropIfPresent that attempts to retrieve a property and returns
true/false on success or failure. This is used to optimize access.
o rdkit learns how to pass property keys by reference, not value.
A new namespace has been added to RDKit, common_properties
that contains the std::string values for commonly used
properties. This helps to avoid typos in string values
but also avoids a creation of std::strings from character
values. All accessors (has/get/clear and getPropIfPresent) now pass
the key by reference.
Additionally, getPropIfPresent removes the double lookup
of hasProp/getProp which can be a significant speedup
in the smiles and smarts parsers (10-20%)