* Speed up tautomer canonicalization by deferring on SSSR calc
* Lazy kekulization for tautomer enumeration
Defer kekulization of tautomers until they are actually needed for
transform matching. This avoids creating kekulized copies for:
1. The initial tautomer (until first iteration)
2. New tautomers that may never be processed (if enumeration ends early)
The Tautomer class now supports lazy initialization of the kekulized
form via getKekulized() method.
Performance improvement: ~7% additional speedup (total ~22-24% from baseline)
* Use count-only substructure matching in tautomer scoring
* Add SubstructMatchCount regression test
* MolStandardize: reduce enumerate overhead
* MolStandardize: avoid per-tautomer ring recomputation
* Atom: cache PeriodicTable pointer in valence calcs
* Atom: reuse PeriodicTable in getEffectiveAtomicNum
* PeriodicTable: add atomic fast path for getTable
* GraphMol: reduce ROMol copy reallocations
* MolStandardize: use quickCopy for per-match product copies
Use RWMol(*kmol, true) in tautomer enumeration to avoid copying properties/bookmarks/conformers for each candidate. This reduces deep-copy overhead without changing chemistry.
* MolStandardize: pre-filter scoring patterns by element/connectivity
For tautomer scoring, pre-compute which SubstructTerms are relevant for
a given input molecule. Since tautomerization only moves H atoms and
changes bond orders (never creates/destroys heavy-atom bonds), patterns
requiring missing elements or connectivity can be skipped for all
tautomers of that molecule.
Two-stage filtering:
1. Element check: skip patterns requiring atoms not in the molecule
2. Connectivity check: skip patterns whose bond-order-agnostic structure
doesn't match the input molecule's connectivity
This reduces the number of VF2 substructure calls per tautomer from 12
to typically 3-5, depending on the molecule's composition.
* MolStandardize: preserve molecule properties for canonical tautomer
Copy molecule properties from the original input to the canonical tautomer
result. Since quickCopy during enumeration skips d_props to avoid overhead,
extended SMILES data like link nodes (LN) was lost. This restores them
on the final result.
* TautomerQuery: preserve molecule properties (e.g. link nodes) in tautomers
TautomerQuery::fromMol() uses TautomerEnumerator::enumerate() which uses
quickCopy for performance. This doesn't copy molecule properties like
_molLinkNodes. Without this fix, XQMol output would lose link node
extensions in the SMILES.
Copy properties from the original query molecule to all enumerated
tautomers before constructing the TautomerQuery. This preserves extended
SMILES data without impacting enumeration performance.
* MolStandardize: use parallel iteration and cache bond lookups
Replace O(n) getAtomWithIdx/getBondWithIdx calls with parallel iteration
over atom/bond ranges in canonicalizeInPlace and enumerate. Cache bond
lookups in setTautomerStereoAndIsoHs to avoid repeated O(n) searches.
* perf: add specialized matchers for simple tautomer scoring patterns
Replace VF2 graph matching with O(n) loops for 6 simple patterns:
- countDoubleOrAromaticBonds: C=O, N=O, P=O patterns
- countMethyls: [CX4H3] methyl groups
- countCarbonDoubleHetero: [C]=[/home/dcvuser/rdkit;Code/GraphMol/MolStandardize/Tautomer.h] aliphatic C=hetero
- countAromaticCarbonExocyclicN: [c]=aromatic C=exocyclic N
Complex patterns (benzoquinone, oxim, guanidine, aci-nitro) still use VF2.
Combined with the pre-filtering optimization, this achieves ~3.7x speedup
(~2500ms vs ~9300ms original) for tautomer canonicalization.
* Fix tautomer canonicalize dropping conformers from quickCopy
quickCopy (RWMol(*mol, true)) skips conformers, so tautomer
enumeration products lose 2D/3D coordinates. This causes InChI
generation to omit the /b (double bond E/Z stereo) layer, since
E/Z is derived from atomic coordinates.
Fix: copy conformers from the original molecule onto the canonical
tautomer after pickCanonical in TautomerEnumerator::canonicalize().
Tests: SMILES-based E/Z check in testTautomer.cpp, molblock-based
conformer preservation check in catch_tests.cpp.
* add test on canonicalize losing stereo
* add regression test for exocyclic C=C tautomer canonicalization
The getTautomerStateKey() pre-filter (commit 2595ef748) can falsely
deduplicate distinct tautomers when their atom-index-ordered state
patterns happen to match, leading canonicalize() to pick the wrong
canonical form for molecules with STEREOTRANS-pinned exocyclic C=C
bonds after RemoveHs.
Test verifies that O=C(CC1=CC2=CC=COC2)NC1=O canonicalizes to the
exocyclic form O=C1CC(=CC2=CC=COC2)C(=O)N1, not the endocyclic form
O=C1C=C(C=C2CC=COC2)C(=O)N1.
Currently expected to FAIL until the state key dedup bug is fixed.
* MolStandardize: expand tautomer connectivity SMARTS
* MolStandardize: scope tautomer pattern enum
* MolStandardize: trim tautomer pattern enum
* MolStandardize: use symmetric ring scoring
* Expose tautomer scoring functions to python
* Add more tests/documentation
* Rename getDefaultTautomerSubstructs to getDefaultTautomerScoreSubstructs
* Remove ROMOL_SPTR
* Add full custom scoring function example
* Run clang format
* Use proper BOOST_PYTHON_FUNCTION_OVERLOADS
* Use default copy constructor
* change valence model to use isolobal analogy
Remove support for five-coordinate C+ and, by analogy, five-coordinate N+2
Removes support for charge states that take atoms past the end of the periodic table
i.e. [Lv-4] is no longer supported
* update the tests for that
* remove valence state of 6 for Al
* fix representation of phosphate in the mol2 parser
this is a correction of what was done during #5973
* cleanup the exceptions for P, S, As, and Se
* drop valence states:
Si 6, P 7, As 7
* a couple of additional changes from #7397
* update java tests
* fix an inconsistency: Rb now supports valence -1
* documentation
* - replace operator[] with at() for bounds check
- extract some code into a function to avoid duplication
- use TAB as separator throughout in the periodic table data for consistency
* removing the .at() usage
We know that these vectors aren't empty, so there's no need for the bounds check.
---------
Co-authored-by: ptosco <paolo.tosco@novartis.com>
* Add a 'force' option to MolStandardize::Uncharger
* update comment
* add more test cases exercising MolStandardize::Uncharger
* fix the neutralization of surplus negative charges
* changes in response to review
* Add a test case for MolStandardize::Uncharger
* refactor the neutralization of negative charges in MolStandardize::Uncharger
* initial addition of MT support to MolStandardize
* do the other inplace functions
* add mt ops to python wrappers
including tests
* release the GIL
* remove exploratory code added during dev
* make normalizer thread safe
* refactor some repeated code
* reionizer and uncharger and normalizer can now operate in place
* add removeUnmatchedAtoms argument to in-place version of runReactant
When set to false atoms which are not explicitly removed by the reaction are preserved
* Fix a case where transforms were incorrectly updating atomic numbers
* add more inplace operations to MolStandardize
* support those in the Python layer
* support inplace for the rest of the python wrappers
* move a few more functions over to the inplace code
* Swap to using a data structure for default normalization parameters
* bring the default fragment data into the code too
* cleanup
* add reionizer parameters via data
change fragment parse failures to ValueErrorExceptions
* tautomer parameters in the code
* got a little over-enthusiastic in that last cleanup
* use boost::flyweight to cache normalization and charge data params
* a bit more cleanup
* support reading params from JSON
* fragments from JSON
single-call for fragment removal
* add a one-liner for the canonical tautomer
* quick refactor
* Fixes#4115
* complete the parents
* docs
* move the definitions to a namespace and make them const
* see if switching to c++14 fixes the CI compile problems with g++ 5.5
* somewhat uglier way of solving the initalizer list problem
* Make MetalDisconnector more robust against metallorganics
* - fixed misbehavior with radicals
- added tests
- code cleanup
* - fixed MetalDisconnector with dative bonds
- removed pointless test
* fixed issue #2965
* added test case for issue #2965
* fixed formatting and added comment.
* update
* General Reader files
* removed dependency on boost filesystems
* removed class
* clang-format
* added-comments
* further-cleanup
* added clang-formatting
* braces-for-if-else
* changed error messages, added option for windows file path
* fixed getFileName function
* cleanup
* option for filename without path
* further-cleanup
* added tests for determineFileFormat
* cleanup, const arguments for validate function
* init
* cleanup
* cleanup
* clang-format does not work for CMake
* added RDK_TEST_MULTITHREADED option
* add-flag
* cleanup
* Delete ConcurrentQueue.h
This PR deals with the Generalized File Reader.
* Delete testConcurrentQueue.cpp
This PR deals with the Generalized File Reader.
* no change
* concurrent queue
* print values
* Single Producer Multiple Consumer works
* cleanup
* Producer Consumer Example
* update queue methods and tests
* cleanup
* test
* fixed tests
* cleanup, updated tests
* Delete ProducerConsumer.h
* Delete testProducerConsumer.cpp
* cleanup
* futher cleanup
* changes based on feedback
* make queue non copyable
* psuedocode
* possible implementation
* untested implementation
* change class to typename
* basic-setup
* need to fix segfault
* need to fix blocking
* need to fix blocking
* need to fix blocking
* fix indentation
* one possibility
* without lambda function
* possible fix with some test cases
* performance tests
* added support for record id and item text
* cleanup
* cleanup
* fixed memory leak and added methods with tests for getting last id and item text
* cleanup
* added more test cases with different smi files
* cleanup
* SD mol supplier
* modified the parsing for SDMolSupplier
* cleanup
* cleanup
* new file for testing
* added support for reading molecule properties with tests
* thread-safe logging and exception handling
* cleanup
* without thread safe logging
* cleanup
* cleanup, modified MultithreadedSmilesMolSupplier
* cleanup, made reader and writer functions private
* move O2.sdf
* basic python wrapper with tests
* cleanup, added new methods for python wrappers
* made changes suggested by Andrew
* file and compression formats are case-insensitive
* cannot open files with gzstream
* cleanup
* possible fix for opening compressed streams (SMILES)
* removed seekg() and tellg() methods from multithreadeded suppliers
* cleanup
* test cases for python wrappers
* some wrapper cleanup
* cleanup, removed unused functions
* update the MT tests so that they actually do some work
also includes some cleanup here
* cleanup
* remove iterator_next header include
* added support for multithreaded readers
* use getNumThreadsToUse for multithreaded suppliers
* fixed documentation for multithreaded python wrappers
* commented performance test
* first draft of final evaluation report
* removed inline variables
* first draft getting started in python
* fixed typos in getting started in python
* fixed typos
* fix documentation tests
* fixed documentation tests
* added links to important files and PR
* added perfomance results
* first version of wrappers with compressed streams
* getting rid of streambuf stream method
* modified General File Reader
* make this work when building in non-threads mode
* rename a test
* rename a function in the python API
* rearrange the python test a bit
* disable the stream-based constructors in Python
* mark the multithreaded classes as experimental
Co-authored-by: greg landrum <greg.landrum@gmail.com>
* This commit fixes the bug "segmenation fault/core dump when chargeParent is run with skip_standardize set to true" mentioned in #2970
* Fixed memory leaks in MolStandardize and deleted variables which aren't required
* modify the uncharger to be use a canonical atom ordering
* add doCanonical cleanup parameter
make canonical ordering the default
document the change
* Add neutralization of additonal negative groups (not just acids).
This may not be the right thing to do.
* expose the new parameter to python
* changes in response to review
* add SKIP_IF_ALL_MATCH argument to FragmentRemover
Refactor FragmentRemover::remove() to make it more efficient
* implement and test SKIP_IF_ALL_MATCH
* expose the extra option to Python
* add info to logger