rdkit

mirror of https://github.com/rdkit/rdkit.git synced 2026-06-04 21:54:27 +08:00

Author	SHA1	Message	Date
Yakov Pechersky	c6cabf4153	Speed-up tautomer canonicalization, no API changes (#9134 ) * Speed up tautomer canonicalization by deferring on SSSR calc * Lazy kekulization for tautomer enumeration Defer kekulization of tautomers until they are actually needed for transform matching. This avoids creating kekulized copies for: 1. The initial tautomer (until first iteration) 2. New tautomers that may never be processed (if enumeration ends early) The Tautomer class now supports lazy initialization of the kekulized form via getKekulized() method. Performance improvement: ~7% additional speedup (total ~22-24% from baseline) * Use count-only substructure matching in tautomer scoring * Add SubstructMatchCount regression test * MolStandardize: reduce enumerate overhead * MolStandardize: avoid per-tautomer ring recomputation * Atom: cache PeriodicTable pointer in valence calcs * Atom: reuse PeriodicTable in getEffectiveAtomicNum * PeriodicTable: add atomic fast path for getTable * GraphMol: reduce ROMol copy reallocations * MolStandardize: use quickCopy for per-match product copies Use RWMol(kmol, true) in tautomer enumeration to avoid copying properties/bookmarks/conformers for each candidate. This reduces deep-copy overhead without changing chemistry. MolStandardize: pre-filter scoring patterns by element/connectivity For tautomer scoring, pre-compute which SubstructTerms are relevant for a given input molecule. Since tautomerization only moves H atoms and changes bond orders (never creates/destroys heavy-atom bonds), patterns requiring missing elements or connectivity can be skipped for all tautomers of that molecule. Two-stage filtering: 1. Element check: skip patterns requiring atoms not in the molecule 2. Connectivity check: skip patterns whose bond-order-agnostic structure doesn't match the input molecule's connectivity This reduces the number of VF2 substructure calls per tautomer from 12 to typically 3-5, depending on the molecule's composition. * MolStandardize: preserve molecule properties for canonical tautomer Copy molecule properties from the original input to the canonical tautomer result. Since quickCopy during enumeration skips d_props to avoid overhead, extended SMILES data like link nodes (LN) was lost. This restores them on the final result. * TautomerQuery: preserve molecule properties (e.g. link nodes) in tautomers TautomerQuery::fromMol() uses TautomerEnumerator::enumerate() which uses quickCopy for performance. This doesn't copy molecule properties like _molLinkNodes. Without this fix, XQMol output would lose link node extensions in the SMILES. Copy properties from the original query molecule to all enumerated tautomers before constructing the TautomerQuery. This preserves extended SMILES data without impacting enumeration performance. * MolStandardize: use parallel iteration and cache bond lookups Replace O(n) getAtomWithIdx/getBondWithIdx calls with parallel iteration over atom/bond ranges in canonicalizeInPlace and enumerate. Cache bond lookups in setTautomerStereoAndIsoHs to avoid repeated O(n) searches. * perf: add specialized matchers for simple tautomer scoring patterns Replace VF2 graph matching with O(n) loops for 6 simple patterns: - countDoubleOrAromaticBonds: C=O, N=O, P=O patterns - countMethyls: [CX4H3] methyl groups - countCarbonDoubleHetero: [C]=[/home/dcvuser/rdkit;Code/GraphMol/MolStandardize/Tautomer.h] aliphatic C=hetero - countAromaticCarbonExocyclicN: [c]=aromatic C=exocyclic N Complex patterns (benzoquinone, oxim, guanidine, aci-nitro) still use VF2. Combined with the pre-filtering optimization, this achieves ~3.7x speedup (~2500ms vs ~9300ms original) for tautomer canonicalization. * Fix tautomer canonicalize dropping conformers from quickCopy quickCopy (RWMol(mol, true)) skips conformers, so tautomer enumeration products lose 2D/3D coordinates. This causes InChI generation to omit the /b (double bond E/Z stereo) layer, since E/Z is derived from atomic coordinates. Fix: copy conformers from the original molecule onto the canonical tautomer after pickCanonical in TautomerEnumerator::canonicalize(). Tests: SMILES-based E/Z check in testTautomer.cpp, molblock-based conformer preservation check in catch_tests.cpp. add test on canonicalize losing stereo * add regression test for exocyclic C=C tautomer canonicalization The getTautomerStateKey() pre-filter (commit 2595ef748) can falsely deduplicate distinct tautomers when their atom-index-ordered state patterns happen to match, leading canonicalize() to pick the wrong canonical form for molecules with STEREOTRANS-pinned exocyclic C=C bonds after RemoveHs. Test verifies that O=C(CC1=CC2=CC=COC2)NC1=O canonicalizes to the exocyclic form O=C1CC(=CC2=CC=COC2)C(=O)N1, not the endocyclic form O=C1C=C(C=C2CC=COC2)C(=O)N1. Currently expected to FAIL until the state key dedup bug is fixed. * MolStandardize: expand tautomer connectivity SMARTS * MolStandardize: scope tautomer pattern enum * MolStandardize: trim tautomer pattern enum * MolStandardize: use symmetric ring scoring	2026-03-31 06:42:40 +02:00
Greg Landrum	0147cd8201	Fixes #5210 (#5408 ) * revert duplicate chunk in release notes * replace deprecated ifdefs This one gets rid of USE_BUILTIN_POPCNT and RDK_THREADSAFE_SS use RDK_OPTIMIZE_POPCNT or RDK_BUILD_THREADSAFE_SSS instead * get rid of BUILD_COORDGEN_SUPPORT from ROMol.i * fix a stupid typo * update release notes	2022-07-11 11:20:03 +02:00
Ric	beedc4cf0d	make PeriodicTable thread safe (#5208 )	2022-04-17 05:37:50 +02:00
Greg Landrum	a7c438f870	Fixes #3646 (#3653 )	2020-12-17 08:34:15 -05:00
Greg Landrum	d41752d558	run clang-tidy with readability-braces-around-statements (#2899 ) * run clang-tidy with readability-braces-around-statements clang-format the results clean up all the parts that clang-tidy-8 broke * fix problem on windows	2020-01-25 14:19:32 +01:00
Greg Landrum	47d8813358	Fixes #2784 (#2787 ) * Fixes #2784 * changes in response to review. * clarifying comment	2019-11-15 07:53:42 -05:00
Ric	a6b26253ff	Fix (most of) mem problems (#2123 ) * do not use new on loggers * del pointers in testDistGeom * Update Dict hasNonPOD status on bulk update * delete new Dicts in memtest1.cpp * fixes in MolSuppliers and testFMCS * PeriodicTable singleton as unique_ptr * fix EEM_arrays leak * fix leaks in testPBF * fix ParamCollection leak in test UFF * fix leaks in MMFF * clear prop dict before read in in pickler * fix leaks in testFreeSASA * fix leaks in test3D * modernize Dict.h & SmilesParse.cpp * fix leaks in testQuery * fix leaks in testCrystalFF * fix leaks in cxsmilesTest * fix leaks in Catalog & mol cat test * fix leaks in ShapeUtils & tests * fix leaks in testSubgraphs1 * fix leaks testFingerprintGenerators * fix leaks in Catalog/FilterCatalog * fix leaks in graphmolqueryTest * these changes reduce bison parse leaks * fixed leaks in testChirality.cpp * fix leaks + 2 tests in testMolWriter * fix 4m leaks in substructLibraryTest * small improvements to molTautomerTest; still leaks * fix leaks in testRGroupDecomp * fix leaks in test; parser still leaks * fix leaks in itertest * fix 4m leaks in testDepictor * fixes in smatest; still leaking due to parser * fixes in testSLNParse; still leaking due to parser * flex/bison: always add atoms with ownership; smarts error cleanup * fix leaks in testReaction * fix leaks in testSubstructMatch * fix leaks in resMolSupplierTest * fix leaks in testChemTransforms + bug in ChemTransforms * fix leaks in testPickler * fix leaks in testMolTransform * fix leaks in testFragCatalog * fix leak in testSLNParse. Still leaks due to Smiles * fixed most leaks in testMolSupplier * pre bison fix * fix some atom & bond parse problems; others still fail * bison smiles & smarts, atoms & bonds more or less fixed * fix leaks in molopstest.cpp * fix leaks in testFingerprints, MACCS.cpp & AtomPairs.cpp * fix leaks in moldraw2Dtest1 * fix leaks in testDescriptors * fix leaks in testInchi * fix leaks in testUFFForceFieldHelpers * fix leaks in hanoiTest & new_canon.h * fix leaks in testMMFFForceField * fix leaks in graphmolTest1 * fix leaks in testMMFFForceFieldHelpers * fix leaks in testDistGeomHelpers * fix leaks in testMolAlign * initialize occupancy & temp facto with default values * fix leak in TautomerTransform * updated suppressions * fix testStructChecker * fix logging & py tests * fix TautomerTransform class/struct issue * remove misplaced delete in testSLNParse * deinit in testAvalonLib1 * fix Avalon-triggered(?) bug in StructChecker/Pattern.cpp * fix random testMolWriter/Supplier fails - diversify output file names to avoid clashing. - unify Writers close/destruct behavior. - flushing/closing in tests. * use reset in FFs Params.cpp * comments on testMMFFForceField * unrequired 'if's added to mol suppliers * correct cast in FilterCatalog.h * use unique_ptr in MACCS Patterns * remove unrequred if in new_canon * update & move suppressions	2018-10-29 14:33:26 +00:00
Greg Landrum	108d84ab1e	Switch from boost::thread to std::thread (#1745 ) * boost::thread mostly gone... still need to get rid of once everything compiles * replace boost::call_once * remove link-time dependency on boost::thread * first pass at using async * switch to using async everywhere	2018-02-22 03:43:07 +01:00
Greg Landrum	7c0bb0b743	clang-tidy output	2017-04-22 17:09:24 +02:00
Greg Landrum	31b3bc7da4	Fixes #381 (#924 ) * Fixes #381 * switch to using boost::call_once * initialize the once_flag properly	2016-05-30 19:58:39 -04:00
Greg Landrum	e08e0d16d8	first pass, using google style	2015-11-14 14:58:11 +01:00
Greg Landrum	3f6d82a4d1	start to take a swing at the locale problem; this still needs testing on non-linux	2012-12-12 03:24:54 +00:00
Greg Landrum	b327ab5b23	get this building with MSVC++	2012-04-25 03:11:07 +00:00
Greg Landrum	162662186d	not working yet	2012-04-19 06:15:40 +00:00
Greg Landrum	3b3d44db16	remove exe property from source files	2011-01-13 04:22:56 +00:00
Greg Landrum	f3fbef45c5	update copyright statements	2010-09-26 17:04:37 +00:00
Greg Landrum	5d03333c22	setup svn keywords (should have done this before import... grn)	2006-05-06 22:54:39 +00:00
Greg Landrum	75a79b6327	initial import	2006-05-06 22:20:08 +00:00

18 Commits