rdkit

mirror of https://github.com/rdkit/rdkit.git synced 2026-06-04 21:54:27 +08:00

Author	SHA1	Message	Date
Yakov Pechersky	c6cabf4153	Speed-up tautomer canonicalization, no API changes (#9134 ) * Speed up tautomer canonicalization by deferring on SSSR calc * Lazy kekulization for tautomer enumeration Defer kekulization of tautomers until they are actually needed for transform matching. This avoids creating kekulized copies for: 1. The initial tautomer (until first iteration) 2. New tautomers that may never be processed (if enumeration ends early) The Tautomer class now supports lazy initialization of the kekulized form via getKekulized() method. Performance improvement: ~7% additional speedup (total ~22-24% from baseline) * Use count-only substructure matching in tautomer scoring * Add SubstructMatchCount regression test * MolStandardize: reduce enumerate overhead * MolStandardize: avoid per-tautomer ring recomputation * Atom: cache PeriodicTable pointer in valence calcs * Atom: reuse PeriodicTable in getEffectiveAtomicNum * PeriodicTable: add atomic fast path for getTable * GraphMol: reduce ROMol copy reallocations * MolStandardize: use quickCopy for per-match product copies Use RWMol(kmol, true) in tautomer enumeration to avoid copying properties/bookmarks/conformers for each candidate. This reduces deep-copy overhead without changing chemistry. MolStandardize: pre-filter scoring patterns by element/connectivity For tautomer scoring, pre-compute which SubstructTerms are relevant for a given input molecule. Since tautomerization only moves H atoms and changes bond orders (never creates/destroys heavy-atom bonds), patterns requiring missing elements or connectivity can be skipped for all tautomers of that molecule. Two-stage filtering: 1. Element check: skip patterns requiring atoms not in the molecule 2. Connectivity check: skip patterns whose bond-order-agnostic structure doesn't match the input molecule's connectivity This reduces the number of VF2 substructure calls per tautomer from 12 to typically 3-5, depending on the molecule's composition. * MolStandardize: preserve molecule properties for canonical tautomer Copy molecule properties from the original input to the canonical tautomer result. Since quickCopy during enumeration skips d_props to avoid overhead, extended SMILES data like link nodes (LN) was lost. This restores them on the final result. * TautomerQuery: preserve molecule properties (e.g. link nodes) in tautomers TautomerQuery::fromMol() uses TautomerEnumerator::enumerate() which uses quickCopy for performance. This doesn't copy molecule properties like _molLinkNodes. Without this fix, XQMol output would lose link node extensions in the SMILES. Copy properties from the original query molecule to all enumerated tautomers before constructing the TautomerQuery. This preserves extended SMILES data without impacting enumeration performance. * MolStandardize: use parallel iteration and cache bond lookups Replace O(n) getAtomWithIdx/getBondWithIdx calls with parallel iteration over atom/bond ranges in canonicalizeInPlace and enumerate. Cache bond lookups in setTautomerStereoAndIsoHs to avoid repeated O(n) searches. * perf: add specialized matchers for simple tautomer scoring patterns Replace VF2 graph matching with O(n) loops for 6 simple patterns: - countDoubleOrAromaticBonds: C=O, N=O, P=O patterns - countMethyls: [CX4H3] methyl groups - countCarbonDoubleHetero: [C]=[/home/dcvuser/rdkit;Code/GraphMol/MolStandardize/Tautomer.h] aliphatic C=hetero - countAromaticCarbonExocyclicN: [c]=aromatic C=exocyclic N Complex patterns (benzoquinone, oxim, guanidine, aci-nitro) still use VF2. Combined with the pre-filtering optimization, this achieves ~3.7x speedup (~2500ms vs ~9300ms original) for tautomer canonicalization. * Fix tautomer canonicalize dropping conformers from quickCopy quickCopy (RWMol(mol, true)) skips conformers, so tautomer enumeration products lose 2D/3D coordinates. This causes InChI generation to omit the /b (double bond E/Z stereo) layer, since E/Z is derived from atomic coordinates. Fix: copy conformers from the original molecule onto the canonical tautomer after pickCanonical in TautomerEnumerator::canonicalize(). Tests: SMILES-based E/Z check in testTautomer.cpp, molblock-based conformer preservation check in catch_tests.cpp. add test on canonicalize losing stereo * add regression test for exocyclic C=C tautomer canonicalization The getTautomerStateKey() pre-filter (commit 2595ef748) can falsely deduplicate distinct tautomers when their atom-index-ordered state patterns happen to match, leading canonicalize() to pick the wrong canonical form for molecules with STEREOTRANS-pinned exocyclic C=C bonds after RemoveHs. Test verifies that O=C(CC1=CC2=CC=COC2)NC1=O canonicalizes to the exocyclic form O=C1CC(=CC2=CC=COC2)C(=O)N1, not the endocyclic form O=C1C=C(C=C2CC=COC2)C(=O)N1. Currently expected to FAIL until the state key dedup bug is fixed. * MolStandardize: expand tautomer connectivity SMARTS * MolStandardize: scope tautomer pattern enum * MolStandardize: trim tautomer pattern enum * MolStandardize: use symmetric ring scoring	2026-03-31 06:42:40 +02:00
Paolo Tosco	adf060c881	- implement #9194 (#9197 ) - remove redundant #include - avoid unnecessary copy of match - expose SubstructMatchParams to JS MinimalLib - add JS SubstructMatchParams test Co-authored-by: ptosco <paolo.tosco@novartis.com>	2026-03-26 05:00:42 +01:00
Ricardo Rodriguez	d90a73aa6b	Leak fixes for 2026.03.1 (#9198 ) * fix mols leaked in tests * own invariant generators * clean up MorganFeatureAtomInvGenerator patterns * address review suggestions	2026-03-25 05:56:26 +01:00
Greg Landrum	cacba34a47	simple substructure optimization (#9201 ) Co-authored-by: = <=>	2026-03-24 15:38:13 +01:00
Eisuke Kawashima	e89c9f656a	style: apply readability-braces-around-statements (#8136 ) Co-authored-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com>	2026-02-09 12:10:50 +01:00
Greg Landrum	21225e63b3	Move some more tests over to catch2 (#9058 ) * move testSubstructMatch to catch2 * modernization * modernization * switch to catch2 * modernize * convert to catch2 * update * move to catch * please be quiet * move to catch2 * changes in response to review --------- Co-authored-by: = <=>	2026-01-24 07:03:04 +01:00
Greg Landrum	ef90a4bedf	Allow adding custom atom and bond matcher functions for substructure searching (#8994 ) * extra SSS match functions for atoms/bonds initial implementation and testing * add baseline to test * add a functor for matching atom coords * support the extra checks in python * refactor the way the python callbacks are handled * test tolerances * expose the AtomCoordsMatcher to python * allow the extra checks to override the default matching --------- Co-authored-by: = <=>	2025-12-12 20:03:31 +01:00
Ricardo Rodriguez	7b7a8a4e17	Refactor iostreams includes (#8846 ) * refactor iostreams includes * restore ostream to MonomerInfo.cpp	2025-10-08 16:08:01 +02:00
Greg Landrum	a9477d2694	Modernization of some substructure code (#8450 ) * use std::span for substruct match callbacks This removes a copy from every evaluation of potential matches * some cleanup/modernization * some modernization * deprecate chiralAtomCompat * small optimization * remove naked pointers * improve new_timings.py script * changes suggested in review * response to review * response to review	2025-05-12 06:33:25 +02:00
Greg Landrum	5976eead54	Fixes #8485 (#8490 )	2025-05-05 08:57:18 +02:00
Greg Landrum	86141183c1	Moving towards getting all tests to pass when using the new stereo code (#8409 ) * Fixes #8379 * check in some working tests * test passes * test passes * test passes * test passes * test passes * ensure that the invariants flush the streams on failure * tests pass * test passes * tests pass * tests pass * tests pass * tests pass * tests pass * tests pass * tests pass * tests pass * tests pass * tests pass * tests pass * tests pass * tests pass * tests pass * Fixes #8391 * tests pass * fix a test with legacy not clear why this was not causing problems before * make a test work * Fixes #8396 * gcc builds work * fingerprint tests pass * mention backwards incompatible change * fix a problem with FindMolChiralCenters * more testing details * enable the test status output * Fixes #8432 fix a bug in double-bond stereo handling for template matching * all depictor tests pass * use the new-stereo chiral ranks in the depiction code * always assign new-stereo chiral ranks * make _ChiralAtomRank a computed property This is analogous to _CIPRank * tweak to the way the atom ordering is computed for 2D coordinate generation * update two expected results * backup * response to review * tests pass * tests pass --------- Co-authored-by: = <=>	2025-04-15 14:00:32 +02:00
Hussein Faara	44364fd982	remove no-op macros and dead code (pt 4) (#8037 ) * remove no-op macros and dead code (pt 4) * review comments	2025-01-26 07:49:50 +01:00
Brian Kelley	9dc5470d89	Simple fix to allow aromatic CIS/TRANS SMARTS to match molecules (#8192 )	2025-01-23 19:49:25 +01:00
Greg Landrum	3035b67067	Fixes #8162 (#8164 ) * remove extraneous printing * Fixes #8162	2025-01-14 15:58:34 +01:00
Greg Landrum	e77d4e3f6a	allow specified chiral features to SSS match unspecified features (#8115 )	2024-12-18 20:37:17 +01:00
Ricardo Rodriguez	db0df54347	Fix some minor issues reported by ubsan and the compiler (#8015 ) * initialize chiralityPossible * fix build warning * Fix integer overflow * fix downcasting MarvinMolBase to MarvinMol * Fix buildwarning * increase PairList container to 64 bit * fix testDict * Update Code/RDGeneral/testDict.cpp Co-authored-by: Greg Landrum <greg.landrum@gmail.com> * Update Code/GraphMol/CIPLabeler/rules/Pairlist.h Co-authored-by: Greg Landrum <greg.landrum@gmail.com> * Update Code/GraphMol/CIPLabeler/rules/Pairlist.h Co-authored-by: Greg Landrum <greg.landrum@gmail.com> * Fix catch_tests.cpp --------- Co-authored-by: Greg Landrum <greg.landrum@gmail.com>	2024-11-20 09:09:22 +01:00
Brian Kelley	eacc365b27	GitHub 7865 haspropwithvaluequery leaks (#7872 ) * Properly cleanup Dict::Pair when serializing HasPropWithQueryValue * Make sure pickling doesn't change original molecule * Fix bad cut and paste * Add PairHolder utility class for memory management of non Dict Dict::Pairs, fix mem leak in pickler * Edit comment to force a rebuild * Ignore PairHolder from Java/Swig builds * Ignore PairHolder API from swig * Reponses to review * Add backward incompatible change * Make release note a bullet point	2024-10-30 06:12:40 +01:00
Ricardo Rodriguez	a44cbe8699	Fixes #7685 (#7864 ) * replace lists with vectors * remove redundant assign loop	2024-09-29 05:23:24 +02:00
Greg Landrum	da6cd73168	Run clang-format across everything (#7849 ) * run clang-format-18 across Code/.cpp and Code/.h * run clang-format-18 across External	2024-09-26 13:39:02 +02:00
Brian Kelley	fa0463a591	Adds HasPropWithValue Pickler (#7692 ) * Adds HasPropWithValue Pickler * Revert changes * Resolve review comments * more comprehensive testing --------- Co-authored-by: Greg Landrum <greg.landrum@gmail.com>	2024-08-26 16:27:58 +02:00
MarioAndWario	643a356e44	Fix Mol block parser by resetting atomic number to 0 for COMPOSITE_OR query atoms (#6349 ) (#6768 ) * Fix Mol block parser by resetting atomic number to 0 for COMPOSITE_OR query atoms (#6349) * Fix fileParsersTest1 * Fix test23MolFileParsing * Fix testRGroupDecomp * Fix testSubstructMatch * Use std::unique_ptr instead of raw pointer * Add isAtomDummy() to QueryOps.h * Add `!a->getQuery()->getNegation()`	2024-01-23 17:13:53 +01:00
tadhurst-cdd	d5d4d194ec	atropisomer handling added (#6903 ) * atropisomer handling added * fixed non-used variables, linking directives * BOOST LIB start/stop fixes, linking fix * Fixes for RDKIT CI errors * minimalLib fix * changed vector<enum> for java builds * check for extra chars in CIP labeling * removed wrong deprecated message * fix ostrstream output error? * restored _ChiralAtomRank to lowercase first letter * changes for merged master * Fixed catch label for new Catch package * update expected psql results * get swig wrappers building * restore MolFileStereochem to FileParsers * fix java wrapper for reapplyMolBlockWedging * some suggestions * move a couple functions out of Bond * Merge branch 'master' into pr/atropisomers2 * merged master * Renamed setStereoanyFromSquiggleBond * atropisomers in cdxml, rationalize atrop wedging, stereoGroups in drawMol * fix for CI build * attempt to fix java build in CI * attempt to fix java build in CI #2 * New routine to remove non-explicit 3D-geneated chirality * changed to use pair for atrop atoms and related bonds * Changes as per PR reviews * PR review respnses * PR review reponse - more * Fix merge from master * fixing java ci after merge * Updated the help doc for atripisomers * update the atropisomer docs * improve the images * add the source CXSMILES --------- Co-authored-by: greg landrum <greg.landrum@gmail.com>	2023-12-22 04:58:18 +01:00
Ric	9c1d1a84f4	Fixes #6983 (#6984 )	2023-12-16 08:06:11 +01:00
Greg Landrum	2957ab4576	switch to catch2 v3 (#6898 ) * switch to catch2 v3 Fixes #6894 * fix a couple of problems noticed in the CI builds * more warning cleanup * changes in response to review	2023-11-15 06:45:42 +01:00
Greg Landrum	4a69bc3493	Fixes #6017 (#6825 ) * Fixes #6017 * a bit of cleanup work * remove unused variable * change in response to review switch to using std::max(maxMatches,maxRecursiveMatches) * test the case where maxSubstructMatches<maxMatches	2023-10-25 04:57:29 +02:00
John Mayfield	dd475b3677	Fix chirality handling when the chiral atom is the first one in a SMARTS (#6730 ) * Set _SmilesStart when parsing SMARTS. * SmartsWriter should also invert first atoms, like SMILES. * Update test cases now these SMILES match themselves as SMARTS. * rerun bison * cleanup a possible repeated define * When an atom moves from the first to second position winding should flip in SMARTS (i.e. same as SMILES). --------- Co-authored-by: greg landrum <greg.landrum@gmail.com>	2023-10-05 06:02:49 +02:00
Paolo Tosco	a384878fbe	avoid leaking memory in case exceptions are thrown while generating FPs (#6630 ) Co-authored-by: ptosco <paolo.tosco@novartis.com>	2023-08-15 04:59:14 +02:00
Rachel Walker	70427aa9b4	Add atom and bond property parameters to substruct matching (#6453 ) * Add atom and bond property parameters to substruct matching * use getPropIfPresent in propertyCompat * fix typo Co-authored-by: Greg Landrum <greg.landrum@gmail.com> * Update Code/GraphMol/Substruct/SubstructUtils.cpp Co-authored-by: Greg Landrum <greg.landrum@gmail.com> * Update Code/GraphMol/Substruct/SubstructUtils.cpp Co-authored-by: Greg Landrum <greg.landrum@gmail.com> * added python tests * Add PRECONDITIONs Co-authored-by: Greg Landrum <greg.landrum@gmail.com> --------- Co-authored-by: Greg Landrum <greg.landrum@gmail.com>	2023-06-15 05:08:48 +02:00
Paolo Tosco	f43c96e442	- when allowRGroups and allowOptionalAttachments are true, bonds connecting R groups to the scaffold should match single or aromatic (#6306 ) - added relevant unit tests Co-authored-by: Tosco, Paolo <paolo.tosco@novartis.com>	2023-05-13 05:54:15 +02:00
Ric	880a8e5725	Reformat Python code for 2023.03 release (#6294 ) * run yapf * run isort --------- Co-authored-by: Greg Landrum <greg.landrum@gmail.com>	2023-04-28 06:53:56 +02:00
Ric	58d135a874	Reformat C/C++ code ahead of 2023.03 release (#6295 ) * format files * format template files too	2023-04-28 04:42:35 +02:00
Paolo Tosco	2aa4fe743d	- allowRGroups now also includes terminal query atoms matching hydrogen in additional to terminal dummy atoms (#6280 ) - added relevant unit tests Co-authored-by: Tosco, Paolo <paolo.tosco@novartis.com>	2023-04-12 06:26:54 +02:00
Greg Landrum	71051cde10	Fixes #6211 (#6250 ) * backup * basic tests pass * add JSON out to substruct match parameters * serialize the substruct match parameters in reactions * add that to the python wrapper * more testing	2023-04-05 19:08:37 +02:00
Franz Waibl	c99115e4b1	Remove check for ring information from Atom::Match (#6063 ) In high-symmetry cases where the symmetric SSSR does not find all possible rings, substructure searches can fail because of this check. Removing it fixes those cases, but is likely to decrease the performance of substructure matching. Also, adds a unit test where the old code fails Co-authored-by: Franz Waibl <waiblfranz@gmail.com>	2023-02-08 04:30:05 +01:00
Greg Landrum	4e1a590b9f	Fixes #888 (#6018 ) * Fixes #888 * support older versions of boost support for hashing dynamic_bitset was not added until v1.71 * changes in response to review	2023-01-30 17:18:22 +01:00
Greg Landrum	0147cd8201	Fixes #5210 (#5408 ) * revert duplicate chunk in release notes * replace deprecated ifdefs This one gets rid of USE_BUILTIN_POPCNT and RDK_THREADSAFE_SS use RDK_OPTIMIZE_POPCNT or RDK_BUILD_THREADSAFE_SSS instead * get rid of BUILD_COORDGEN_SUPPORT from ROMol.i * fix a stupid typo * update release notes	2022-07-11 11:20:03 +02:00
Greg Landrum	594c58f86c	make the catch tests build faster (#5284 ) * reorg the catch tests the goal here is to make the builds faster * make that easier	2022-05-17 04:39:33 +02:00
Eisuke Kawashima	27f711a658	Run clang-tidy (readability-braces-around-statements) (#4977 ) https://github.com/rdkit/rdkit/pull/3024#discussion_r526549843	2022-03-10 08:00:10 +01:00
Greg Landrum	00f23b5047	[WIP] Clean up the warning landscape (#5048 ) * suppress a bunch of warnings from third-party code get rid of one warning in RDKit code * corrections * fix the maeparser flags * remove some more inchi warnings with clang	2022-03-01 05:00:25 +01:00
Brian Kelley	866e0f19f0	silence warnings in MSVC compliatons (#4796 )	2021-12-15 04:54:11 +01:00
Greg Landrum	ff1ea80eca	fix typos (#4769 )	2021-12-06 09:34:33 +01:00
Greg Landrum	52f73e4be0	Add support for Beilstein generics when doing substructure queries (#4673 ) * backup commit This is mabye heading in the right direction and at least passes the basic tests which are there. * some progress * more tests and refactoring * additional aliases add carboaryl * add CYC and ACY * add ABC * add AHC * CBC and AOX * add CHC and HAR * add CXX * cleanup: remove a bunch of nullptrs * initial tagging support * remove atom labels/sgroups after using them * docs * start handing writing NOTE: this does not currently work: the generic code needs to move out of SubstructSearch * move the generic groups to their own library Signed-off-by: greg landrum <greg.landrum@gmail.com> * make sure the generic groups end up in ctabs * add forgotten CMakeLists.txt * fix includes * expose this stuff to Python * CYC needs to initialize rings * renaming * add docs * change in response to review	2021-12-01 06:01:53 +01:00
Brian Kelley	cc5a941269	Recursive Substructure Search Deadlock (#4656 ) * fixes #4651 * Remove unused import * Switch to RAII * clang-format * Response to review * Response to review - remove prints, use assertRaises	2021-11-08 06:30:14 +01:00
Eisuke Kawashima	11532089de	Run clang-format against cpp (#4358 )	2021-10-20 04:25:27 +02:00
Ric	878c4c7ec0	save one search (#4566 )	2021-09-28 04:37:24 +02:00
Ric	6db202aa0d	Improve performance of removing substruct/tautomer duplicates (#4560 ) * improve removeDuplicates performance * improve removeTautomerDuplicates performance * use std::set	2021-09-25 15:45:55 +02:00
Ric	2c7485fef5	Fixes #4558 (#4559 ) * fix * add test	2021-09-25 15:45:30 +02:00
Greg Landrum	3193b76d8c	cleanup some compiler warnings (#4521 ) * cleanup some clang warnings * get rid of some VC++ warnings	2021-09-16 04:34:40 +02:00
Paolo Tosco	3904b6958d	Fixes RDK_BUILD_THREADSAFE_SSS=OFF build (#4349 ) * - fix non-threaded nix builds that currently fail because boost flyweight introduces a dependency on pthreads - make sure that mutexes and futures are only used when RDK_BUILD_THREADSAFE_SSS is ON - fix SubstructMatch failing test when RDK_BUILD_THREADSAFE_SSS is OFF due to misplaced #ifdef's - rename RDK_TEST_MULTITHREADED to RDK_THREADSAFE_SSS in inchi.cpp (which is not a test) - the limitexternal Linux build is now single-threaded so we make sure single-threaded builds do not break in the future (suggestion from Greg) * reverted unnecessary change to Code/GraphMol/FileParsers/testMultithreadedMolSupplier.cpp Co-authored-by: Paolo Tosco <paolo.tosco@novartis.com>	2021-07-23 14:25:25 +02:00
Paolo Tosco	4451bcde67	Make sure that ResonanceMolSupplier substructure matches are uniquified consistently (#4274 ) * make sure that ResonanceMolSupplier subtructure matches are uniquified consistently * Fixes github #4311 (#4312) * a bit of simple refactoring * Fixes #4311 - adds getValenceContrib() to QueryBond - adds hasBondTypeQuery() and hasComplexBondTypeQuery() to QueryOps namespace - atoms with complex bond type queries now have explict and implicit valences of 0 - adds tests for the above * add a test * Support using SubstructMatchParameters in RGD (#4318) * support substructure search parameters in RGD. Still needs testing/verification of the enhanced stereo stuff * test enhanced stereo * add support to python wrapper unfortunately some python reformatting got mixed in there. * addressed comments in review Co-authored-by: Paolo Tosco <paolo.tosco@novartis.com> Co-authored-by: Greg Landrum <greg.landrum@gmail.com>	2021-07-13 06:54:07 +02:00

1 2 3 4

175 Commits