* Speed up tautomer canonicalization by deferring on SSSR calc
* Lazy kekulization for tautomer enumeration
Defer kekulization of tautomers until they are actually needed for
transform matching. This avoids creating kekulized copies for:
1. The initial tautomer (until first iteration)
2. New tautomers that may never be processed (if enumeration ends early)
The Tautomer class now supports lazy initialization of the kekulized
form via getKekulized() method.
Performance improvement: ~7% additional speedup (total ~22-24% from baseline)
* Use count-only substructure matching in tautomer scoring
* Add SubstructMatchCount regression test
* MolStandardize: reduce enumerate overhead
* MolStandardize: avoid per-tautomer ring recomputation
* Atom: cache PeriodicTable pointer in valence calcs
* Atom: reuse PeriodicTable in getEffectiveAtomicNum
* PeriodicTable: add atomic fast path for getTable
* GraphMol: reduce ROMol copy reallocations
* MolStandardize: use quickCopy for per-match product copies
Use RWMol(*kmol, true) in tautomer enumeration to avoid copying properties/bookmarks/conformers for each candidate. This reduces deep-copy overhead without changing chemistry.
* MolStandardize: pre-filter scoring patterns by element/connectivity
For tautomer scoring, pre-compute which SubstructTerms are relevant for
a given input molecule. Since tautomerization only moves H atoms and
changes bond orders (never creates/destroys heavy-atom bonds), patterns
requiring missing elements or connectivity can be skipped for all
tautomers of that molecule.
Two-stage filtering:
1. Element check: skip patterns requiring atoms not in the molecule
2. Connectivity check: skip patterns whose bond-order-agnostic structure
doesn't match the input molecule's connectivity
This reduces the number of VF2 substructure calls per tautomer from 12
to typically 3-5, depending on the molecule's composition.
* MolStandardize: preserve molecule properties for canonical tautomer
Copy molecule properties from the original input to the canonical tautomer
result. Since quickCopy during enumeration skips d_props to avoid overhead,
extended SMILES data like link nodes (LN) was lost. This restores them
on the final result.
* TautomerQuery: preserve molecule properties (e.g. link nodes) in tautomers
TautomerQuery::fromMol() uses TautomerEnumerator::enumerate() which uses
quickCopy for performance. This doesn't copy molecule properties like
_molLinkNodes. Without this fix, XQMol output would lose link node
extensions in the SMILES.
Copy properties from the original query molecule to all enumerated
tautomers before constructing the TautomerQuery. This preserves extended
SMILES data without impacting enumeration performance.
* MolStandardize: use parallel iteration and cache bond lookups
Replace O(n) getAtomWithIdx/getBondWithIdx calls with parallel iteration
over atom/bond ranges in canonicalizeInPlace and enumerate. Cache bond
lookups in setTautomerStereoAndIsoHs to avoid repeated O(n) searches.
* perf: add specialized matchers for simple tautomer scoring patterns
Replace VF2 graph matching with O(n) loops for 6 simple patterns:
- countDoubleOrAromaticBonds: C=O, N=O, P=O patterns
- countMethyls: [CX4H3] methyl groups
- countCarbonDoubleHetero: [C]=[/home/dcvuser/rdkit;Code/GraphMol/MolStandardize/Tautomer.h] aliphatic C=hetero
- countAromaticCarbonExocyclicN: [c]=aromatic C=exocyclic N
Complex patterns (benzoquinone, oxim, guanidine, aci-nitro) still use VF2.
Combined with the pre-filtering optimization, this achieves ~3.7x speedup
(~2500ms vs ~9300ms original) for tautomer canonicalization.
* Fix tautomer canonicalize dropping conformers from quickCopy
quickCopy (RWMol(*mol, true)) skips conformers, so tautomer
enumeration products lose 2D/3D coordinates. This causes InChI
generation to omit the /b (double bond E/Z stereo) layer, since
E/Z is derived from atomic coordinates.
Fix: copy conformers from the original molecule onto the canonical
tautomer after pickCanonical in TautomerEnumerator::canonicalize().
Tests: SMILES-based E/Z check in testTautomer.cpp, molblock-based
conformer preservation check in catch_tests.cpp.
* add test on canonicalize losing stereo
* add regression test for exocyclic C=C tautomer canonicalization
The getTautomerStateKey() pre-filter (commit 2595ef748) can falsely
deduplicate distinct tautomers when their atom-index-ordered state
patterns happen to match, leading canonicalize() to pick the wrong
canonical form for molecules with STEREOTRANS-pinned exocyclic C=C
bonds after RemoveHs.
Test verifies that O=C(CC1=CC2=CC=COC2)NC1=O canonicalizes to the
exocyclic form O=C1CC(=CC2=CC=COC2)C(=O)N1, not the endocyclic form
O=C1C=C(C=C2CC=COC2)C(=O)N1.
Currently expected to FAIL until the state key dedup bug is fixed.
* MolStandardize: expand tautomer connectivity SMARTS
* MolStandardize: scope tautomer pattern enum
* MolStandardize: trim tautomer pattern enum
* MolStandardize: use symmetric ring scoring
* merge ABS groups on setStereoGroups
* warn/fail on multiple ABS groups on strict parsing
* add a test for setStereoGroups
* add a test for multiple ABS groups in mol blocks
* Drop the warning
* simple modernization
* more
* done with RWMol for this pass
* the ROMol.cpp variant
* Atom
* minor change to bond
* simplify Conformer
* monomerinfo, queryatom, querybond
queryatom and querybond cpp files still need to be done
* typos
* revert a dumb change
* suggestion from review
* atropisomer handling added
* fixed non-used variables, linking directives
* BOOST LIB start/stop fixes, linking fix
* Fixes for RDKIT CI errors
* minimalLib fix
* changed vector<enum> for java builds
* check for extra chars in CIP labeling
* removed wrong deprecated message
* fix ostrstream output error?
* restored _ChiralAtomRank to lowercase first letter
* changes for merged master
* Fixed catch label for new Catch package
* update expected psql results
* get swig wrappers building
* restore MolFileStereochem to FileParsers
* fix java wrapper for reapplyMolBlockWedging
* some suggestions
* move a couple functions out of Bond
* Merge branch 'master' into pr/atropisomers2
* merged master
* Renamed setStereoanyFromSquiggleBond
* atropisomers in cdxml, rationalize atrop wedging, stereoGroups in drawMol
* fix for CI build
* attempt to fix java build in CI
* attempt to fix java build in CI #2
* New routine to remove non-explicit 3D-geneated chirality
* changed to use pair for atrop atoms and related bonds
* Changes as per PR reviews
* PR review respnses
* PR review reponse - more
* Fix merge from master
* fixing java ci after merge
* Updated the help doc for atripisomers
* update the atropisomer docs
* improve the images
* add the source CXSMILES
---------
Co-authored-by: greg landrum <greg.landrum@gmail.com>
* move definition of a couple global constants from a .h to a .cpp
* careful removal of some redundant atom PRECONDITIONS
* careful remove of some redundant ROMol PRECONDITIONS
a bit of additional cleanup
* optimization masquerading as modernization
* some more tidying
* a bit more atom cleanup
* change in response to review
* added ROMol::hasQuery
* python bindings for Mol.HasQuery
* at least I checked that my Python tests were running...
* hasQuery use C++11 range iterators
* add boost::serialization support to ROMol
* add RWMol test
* get the windows DLL builds working
* switch FilterCatalog and SubstructLibrary to serialize ROMols
* an actual solution to the windows dll problem
* the FilterMatchers stuff was not working
* backup
* simple first pass, passes all tests
* cleanup a bunch of existing uses
* ensure that we can safely add atoms/bonds while in edit mode
* add context manager on python side
* handle exceptions properly in those
* changes in response to review
* remove include from headers
* update implementation files
* completely remove BOOST_FOREACH (#7)
* convert those changes to use auto
* get rid of all usage of BOOST_FOREACH
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* - eliminate some documentation ambiguity about the role of the strictParsing flag
- fix some inconsistencies between SGroup parsing function prototype declarations and implementations
- add a workaround for accepting malformed V2000 'M SAP' entries affecting older version of MarvinJS (only if strictParsing is set to false)
- if strictParsing is set to false, malformed V2000/V3000 SGroups are ignored rather than causing the parsing to fail
- fix a couple typos in warnings
* changes in response to review
* fix ossfuzz issue 24074
* fix ossfuzz issue 23896
* switch to throw exceptions when reading ints/floats
* remove extraneous benchmarking code
* change type of AH query
* confirm an invariant while finding rings
* no sense in adding these tests to github
* switch to use fail() instead of failbit
switch to acceptSpaces by default
* Progress on #3168
* Fixes#3167
* Fixes#3169
* deal with CBONDS too
* test PATOMS
* Fixes#3175
* a bit of code simplification and test updates
still needs more testing
* more testing
* handle s-group hierarchy
also a couple of other changes in response to the review
* add forgotten test file
* changes in response to review
* run clang-tidy with readability-braces-around-statements
clang-format the results
clean up all the parts that clang-tidy-8 broke
* fix problem on windows
* first round of cleanups based on PVS-studio suggestions
* a couple more
* a few more cleanups
* another round of cleanups
* undo one of those cleanups
we want the integer rounding behavior here
* add a comment to make that clear
* Fix for filter catalog PRECONDITION redundancy
* Implementation of SGroups
* remove sample files test
* update gitignore with test outputs
* fix RevisionModifier
* re-enable tests
* backup commit; things seem to work so far
* some refactoring; obvious s group tests pass now
* more refactoring
* everything now out of the public API
* not sure why this was still in there
* rename functions; all tests now pass
* remove getNextFreeSGroupId; readd comment in copy SGroups
* clang-format
* squash-merge current master
* squash merge master
* Address comments on PR
- Update to current master.
- Move SGroup parse time checks to SGroupChecks namespace.
- Store SGroups in ROMOl as vector<SGroups>.
- SGroup methods return referenes instead of pointers.
- Use atom/bond/sgroup indexes for properties instead of pointers.
- Have SGroups inherit from RDProps; move properties to RDProps.
- Remove trivial/unused methods.
- Add a link to the SD specification atop SGroup.h
* add a couple test files
* backup
* first pass at some theory documentatin
* it's a draft
* Update enhanced stereochemistry documentation
Adds initial target use case and caveats about the tentative
nature of the current implementation.
* Support read/write of molfile enhanced stereochemistry
This includes reading and writing of enhanced stereochemistry
from v3000 molfiles (sdf). Enhanced stereochemistry encodes
the relative configuration of stereocenters, allowing
representation of racemic mixtures and compounds with
unknown absolute stereochemistry.
It does not include:
* Python wrapping
* invalidation of the enhanced stereochemistry
* use of enhanced stereochemistry in search
* depiction of enhanced stereochemistry.
* Update to reflect changes from #1971
* change names of enum elements to allow compilation in VS2017
I think it's also clearer to do things this way
* Addressed most review comments.
* Run missed test "testEnhancedStereoChemistry"
* In tests, added size checks to group equality checks
* Updated copyright statements
* Deleted mol created for a test
* Use perfect forwarding in RWMol::setStereoGroups()
* use references for stereo groups that are checked in write and pickle
* Updated stereogroup.h in hopes of fixing compilation on Windows.
* clang-format
* try allowing a switch to boost regex and requiring it for g++-4.8
* do a better job of that
* typo
* Code review comments. Updated Copyright notice.
* When an atom is deleted, delete stereo groups containing it.
Also updates StereoGroup toUse accessors instead of
constant member attributes. This allows move of StereoGroups.
* RDKit style guide
* Add header required on Windows.
* get the SWIG wrappers to build
* add atoms
* add bonds
* backup
* Fixes#2029
* Get metadata working for drawMolecules()
* add to python wrapper
const correctness
* also connected to #2029: make sure bond direction also ends up in the output
* initial version of an SVG->ROMol parser
this is in the wrong place, but I wanted to make sure it actually works
* move svg parser to a more reasonable location
there is still some work to be done here
* add conformer parsing
* Removes ATOM/BOND_SPTR in boost::graph in favor of raw pointers
* Actually delete atoms and bonds...
* RWMol::clear now calls destroy to handle atom/bond deletion
* Changes broken Atom lookup for windows/gcc
* Adds tests for running with valgrind
* Adds test designed for valgrind and molecule deletions
* Removes RNG, actually tests bond deletions
* update swig wrappers
* deal with most recent changes on the main branch
* Fixes atom documentation
* Fixes#1461
This is a complicated one. Basically URANGE_CHECK when
used on unsigned integers has a problem when the size of
the range it’s checking is 0. The standard operations is
to check
URANGE(num, size-1)
Which (for unsigned integers) obviously rolls over.
This fixes all usage cases to be
URANGE(num+1, size)
And fixes the bugs found. (addBond and the mmff tests)
* Fixes#1461 - Updates URANGE_CHECK to be 0<=x<hi
* Adds Atom atom map and rlabel apis
* Moves RLabels to their own namespace, adds other properties.
* Removes namespaces, liberally adds Atom to function names.
* move detail::computedPropName to RDKit::detail::computedPropName
* Adds RDAny (smaller generic holder) Updates all used dictionaries
This is an API compliant version of the current rdany system,
but uses a lot less memory in practice.
* Removes code duplication
* Converts CHECK_INVARIANT to TEST_ASSERT
* Fixes DoubleTag issue
* Adds Bool to DoubleMagic implementation
* Removes reference to property pickler
a vector of shared_ptr to Snapshots
- The PySnapshot class was removed
- the Trajectory::readAmber and Trajectory::readGromos member functions
were converted into non-member functions
- tests were modified accordingly
o rdkit gains a RDKit::common_properties namespace that contains common string value properties
o Dict.h and below gain getPropIfPresent that attempts to retrieve a property and returns
true/false on success or failure. This is used to optimize access.
o rdkit learns how to pass property keys by reference, not value.
A new namespace has been added to RDKit, common_properties
that contains the std::string values for commonly used
properties. This helps to avoid typos in string values
but also avoids a creation of std::strings from character
values. All accessors (has/get/clear and getPropIfPresent) now pass
the key by reference.
Additionally, getPropIfPresent removes the double lookup
of hasProp/getProp which can be a significant speedup
in the smiles and smarts parsers (10-20%)