rdkit

mirror of https://github.com/rdkit/rdkit.git synced 2026-06-04 21:54:27 +08:00

Author	SHA1	Message	Date
Yakov Pechersky	c6cabf4153	Speed-up tautomer canonicalization, no API changes (#9134 ) * Speed up tautomer canonicalization by deferring on SSSR calc * Lazy kekulization for tautomer enumeration Defer kekulization of tautomers until they are actually needed for transform matching. This avoids creating kekulized copies for: 1. The initial tautomer (until first iteration) 2. New tautomers that may never be processed (if enumeration ends early) The Tautomer class now supports lazy initialization of the kekulized form via getKekulized() method. Performance improvement: ~7% additional speedup (total ~22-24% from baseline) * Use count-only substructure matching in tautomer scoring * Add SubstructMatchCount regression test * MolStandardize: reduce enumerate overhead * MolStandardize: avoid per-tautomer ring recomputation * Atom: cache PeriodicTable pointer in valence calcs * Atom: reuse PeriodicTable in getEffectiveAtomicNum * PeriodicTable: add atomic fast path for getTable * GraphMol: reduce ROMol copy reallocations * MolStandardize: use quickCopy for per-match product copies Use RWMol(kmol, true) in tautomer enumeration to avoid copying properties/bookmarks/conformers for each candidate. This reduces deep-copy overhead without changing chemistry. MolStandardize: pre-filter scoring patterns by element/connectivity For tautomer scoring, pre-compute which SubstructTerms are relevant for a given input molecule. Since tautomerization only moves H atoms and changes bond orders (never creates/destroys heavy-atom bonds), patterns requiring missing elements or connectivity can be skipped for all tautomers of that molecule. Two-stage filtering: 1. Element check: skip patterns requiring atoms not in the molecule 2. Connectivity check: skip patterns whose bond-order-agnostic structure doesn't match the input molecule's connectivity This reduces the number of VF2 substructure calls per tautomer from 12 to typically 3-5, depending on the molecule's composition. * MolStandardize: preserve molecule properties for canonical tautomer Copy molecule properties from the original input to the canonical tautomer result. Since quickCopy during enumeration skips d_props to avoid overhead, extended SMILES data like link nodes (LN) was lost. This restores them on the final result. * TautomerQuery: preserve molecule properties (e.g. link nodes) in tautomers TautomerQuery::fromMol() uses TautomerEnumerator::enumerate() which uses quickCopy for performance. This doesn't copy molecule properties like _molLinkNodes. Without this fix, XQMol output would lose link node extensions in the SMILES. Copy properties from the original query molecule to all enumerated tautomers before constructing the TautomerQuery. This preserves extended SMILES data without impacting enumeration performance. * MolStandardize: use parallel iteration and cache bond lookups Replace O(n) getAtomWithIdx/getBondWithIdx calls with parallel iteration over atom/bond ranges in canonicalizeInPlace and enumerate. Cache bond lookups in setTautomerStereoAndIsoHs to avoid repeated O(n) searches. * perf: add specialized matchers for simple tautomer scoring patterns Replace VF2 graph matching with O(n) loops for 6 simple patterns: - countDoubleOrAromaticBonds: C=O, N=O, P=O patterns - countMethyls: [CX4H3] methyl groups - countCarbonDoubleHetero: [C]=[/home/dcvuser/rdkit;Code/GraphMol/MolStandardize/Tautomer.h] aliphatic C=hetero - countAromaticCarbonExocyclicN: [c]=aromatic C=exocyclic N Complex patterns (benzoquinone, oxim, guanidine, aci-nitro) still use VF2. Combined with the pre-filtering optimization, this achieves ~3.7x speedup (~2500ms vs ~9300ms original) for tautomer canonicalization. * Fix tautomer canonicalize dropping conformers from quickCopy quickCopy (RWMol(mol, true)) skips conformers, so tautomer enumeration products lose 2D/3D coordinates. This causes InChI generation to omit the /b (double bond E/Z stereo) layer, since E/Z is derived from atomic coordinates. Fix: copy conformers from the original molecule onto the canonical tautomer after pickCanonical in TautomerEnumerator::canonicalize(). Tests: SMILES-based E/Z check in testTautomer.cpp, molblock-based conformer preservation check in catch_tests.cpp. add test on canonicalize losing stereo * add regression test for exocyclic C=C tautomer canonicalization The getTautomerStateKey() pre-filter (commit 2595ef748) can falsely deduplicate distinct tautomers when their atom-index-ordered state patterns happen to match, leading canonicalize() to pick the wrong canonical form for molecules with STEREOTRANS-pinned exocyclic C=C bonds after RemoveHs. Test verifies that O=C(CC1=CC2=CC=COC2)NC1=O canonicalizes to the exocyclic form O=C1CC(=CC2=CC=COC2)C(=O)N1, not the endocyclic form O=C1C=C(C=C2CC=COC2)C(=O)N1. Currently expected to FAIL until the state key dedup bug is fixed. * MolStandardize: expand tautomer connectivity SMARTS * MolStandardize: scope tautomer pattern enum * MolStandardize: trim tautomer pattern enum * MolStandardize: use symmetric ring scoring	2026-03-31 06:42:40 +02:00
Greg Landrum	ef90a4bedf	Allow adding custom atom and bond matcher functions for substructure searching (#8994 ) * extra SSS match functions for atoms/bonds initial implementation and testing * add baseline to test * add a functor for matching atom coords * support the extra checks in python * refactor the way the python callbacks are handled * test tolerances * expose the AtomCoordsMatcher to python * allow the extra checks to override the default matching --------- Co-authored-by: = <=>	2025-12-12 20:03:31 +01:00
Greg Landrum	a9477d2694	Modernization of some substructure code (#8450 ) * use std::span for substruct match callbacks This removes a copy from every evaluation of potential matches * some cleanup/modernization * some modernization * deprecate chiralAtomCompat * small optimization * remove naked pointers * improve new_timings.py script * changes suggested in review * response to review * response to review	2025-05-12 06:33:25 +02:00
Greg Landrum	5976eead54	Fixes #8485 (#8490 )	2025-05-05 08:57:18 +02:00
Greg Landrum	e77d4e3f6a	allow specified chiral features to SSS match unspecified features (#8115 )	2024-12-18 20:37:17 +01:00
Greg Landrum	4a69bc3493	Fixes #6017 (#6825 ) * Fixes #6017 * a bit of cleanup work * remove unused variable * change in response to review switch to using std::max(maxMatches,maxRecursiveMatches) * test the case where maxSubstructMatches<maxMatches	2023-10-25 04:57:29 +02:00
Rachel Walker	70427aa9b4	Add atom and bond property parameters to substruct matching (#6453 ) * Add atom and bond property parameters to substruct matching * use getPropIfPresent in propertyCompat * fix typo Co-authored-by: Greg Landrum <greg.landrum@gmail.com> * Update Code/GraphMol/Substruct/SubstructUtils.cpp Co-authored-by: Greg Landrum <greg.landrum@gmail.com> * Update Code/GraphMol/Substruct/SubstructUtils.cpp Co-authored-by: Greg Landrum <greg.landrum@gmail.com> * added python tests * Add PRECONDITIONs Co-authored-by: Greg Landrum <greg.landrum@gmail.com> --------- Co-authored-by: Greg Landrum <greg.landrum@gmail.com>	2023-06-15 05:08:48 +02:00
Greg Landrum	71051cde10	Fixes #6211 (#6250 ) * backup * basic tests pass * add JSON out to substruct match parameters * serialize the substruct match parameters in reactions * add that to the python wrapper * more testing	2023-04-05 19:08:37 +02:00
Greg Landrum	4e1a590b9f	Fixes #888 (#6018 ) * Fixes #888 * support older versions of boost support for hashing dynamic_bitset was not added until v1.71 * changes in response to review	2023-01-30 17:18:22 +01:00
Brian Kelley	866e0f19f0	silence warnings in MSVC compliatons (#4796 )	2021-12-15 04:54:11 +01:00
Greg Landrum	52f73e4be0	Add support for Beilstein generics when doing substructure queries (#4673 ) * backup commit This is mabye heading in the right direction and at least passes the basic tests which are there. * some progress * more tests and refactoring * additional aliases add carboaryl * add CYC and ACY * add ABC * add AHC * CBC and AOX * add CHC and HAR * add CXX * cleanup: remove a bunch of nullptrs * initial tagging support * remove atom labels/sgroups after using them * docs * start handing writing NOTE: this does not currently work: the generic code needs to move out of SubstructSearch * move the generic groups to their own library Signed-off-by: greg landrum <greg.landrum@gmail.com> * make sure the generic groups end up in ctabs * add forgotten CMakeLists.txt * fix includes * expose this stuff to Python * CYC needs to initialize rings * renaming * add docs * change in response to review	2021-12-01 06:01:53 +01:00
Eisuke Kawashima	78aac3c1bc	Run clang-format against header files (#4143 )	2021-06-08 07:57:51 +02:00
Gareth Jones	c2fb57c19f	RGD - a fix for the cubane issue (single target atom matches 2 user R group attachments) (#4002 ) * Most tests working * All tests working * Fixed tests after merge with master * Create header and implementations for RCore * Updated comments * Removed old code * DLL export for MolMatchFinalCheckFunctor * Information line for failing Mac test * Log replace core behaviour * Ordering fix for OSX * Possible fuzzer fix * Removed debug output * Fix unmatched user R group bug * Code review changes * Bug fix and ChemTransforms test	2021-05-23 15:16:03 -04:00
Greg Landrum	f829c877d8	MinimalLib: add CFFI interface (#4018 ) * hello world works * more * more minimallib needs to be tested * parse substructure parameters from JSON * add substruct search and parameters * add descriptors * register more descriptors * fingerprints, first pass * stop outputting tiny coord vals * support generating 2d coords * coordgen testing * return nulls * initial 3d support; add/removeHs; cleanup * Embedding parameters from JSON * update * pattern fp, fps as bytes * use json to configure MFP * use json to configure rdkit and pattern fps * aligned 2d coords * parsing options * options for writers * rename remove_hs * get this working on windows (kind of) * silence some msvc warnings * cmake updates * update python tests * add the CFFI code to CI builds * cleanup line ending mess? * a couple small fixes * make this work with URF * support coordMap in the 3D coordinate generation * updates in response to review	2021-04-15 21:33:52 +02:00
Dan N	3dc1a220b7	Allow enhanced stereo to be used in substructure search (#3003 ) * Test only commit for using enhanced stereo in substructure search Adds some test cases to demonstrate what I'm planning. When the test cases fail, the messages look like this: ------------------------------------------------------------------------------- Enhanced stereochemistry AND and OR match their enantiomer ------------------------------------------------------------------------------- /Users/wandschn/Documents/src/rdkit/Code/GraphMol/Substruct/catch_tests.cpp:216 ............................................................................... /Users/wandschn/Documents/src/rdkit/Code/GraphMol/Substruct/catch_tests.cpp:218: FAILED: CHECK_THAT( opposite_mol, IsSubstructOf(mol_and, ps) ) with expansion: CC[C@@H](F)[C@@H](C)O is not a substructure of CC[C@H](F)[C@H](C)O \|&1:2,4\| /Users/wandschn/Documents/src/rdkit/Code/GraphMol/Substruct/catch_tests.cpp:219: FAILED: CHECK_THAT( opposite_mol, IsSubstructOf(mol_or, ps) ) with expansion: CC[C@@H](F)[C@@H](C)O is not a substructure of CC[C@H](F)[C@H](C)O \|o1:2,4\| * rename parameter to include q and m to reduce my confusion * Don't keep recreating a map This map is the same in every loop. And actually, the desired information is slightly different than what was formerly stored in the map. * Fix tests after our discussion. Also adds more exciting tests of disastereomers and structures with multiple stereo groups. * Use enhanced stereochemistry in substructure searching Allows use of enhaced stereochemistry in substructure searching if `SubstructMatchParameters.useEnhancedStereo` is set. The matching rules are pretty obnoxious, but a synopsis is: * An achiral query/substructure matches everything, because it means "ignore chirality". * An absolute query matches AND or OR, because they both include the molecule with an absolute center * An query with an OR matches either an OR or an AND, because AND is more molecules. * add info about matching to the documentation * expose extended stereo matching option to python * Some updates/tweaks to the documentation of enhanced stereochemistry especially about searching. * Code review comments. Co-authored-by: greg landrum <greg.landrum@gmail.com>	2020-03-21 05:12:40 +01:00
Greg Landrum	a2767d9f7d	Allow custom post-match filters for substructure matching (#2927 ) * backup, does not work * working on the C++ side * backup * fix the API * document the new functionality * improve that example * final bit of cleanup * switch to std::function	2020-02-04 11:22:38 -05:00
Greg Landrum	ec31bea97b	clang-tidy-7 pass (#2408 )	2019-04-16 12:05:47 -04:00
Greg Landrum	a102eaf932	Add options for substructure searching (#2254 ) * first pass at adding a SubstructMatchParameter struct * start moving the rest of the backend to use the parameters * backend at least mostly moved over * add aromaticMatchesConjugated add tests * switch over the MolBundle too Add templates to reduce duplicated code * support older compilers let's see if it works... * add SubstructMatchParameters to Python wrapper * remove some deprecations and warnings * damn compilers * parameter support for bundles in python wrapper * add the parameters to the java wrappers * response to review	2019-02-08 09:10:10 -05:00
Greg Landrum	2738c35178	Fixes #1903 (#1971 ) * Fixes #1903 * update SWIG bindings too	2018-07-25 09:14:17 +02:00
Paolo Tosco	c08ea49bda	- enable building DLLs on Windows (#1861 ) * - enable building DLLs on Windows * - export.h and test.h are now auto-generated by CMake	2018-05-16 08:42:41 +02:00
Greg Landrum	bbd615497a	Add a MolBundle class (#1537 ) * very basics * add the version to get all matches * better exceptions, including tests * documentation and actually add the test code * responses to review	2017-09-11 13:04:58 -04:00
Greg Landrum	769e6648e4	Fixes #1489 (#1556 ) * move the describeQuery functions to the RDKit namespace. They are generally useful * Fixes #1489	2017-09-11 08:34:25 -04:00
Greg Landrum	e08e0d16d8	first pass, using google style	2015-11-14 14:58:11 +01:00
Greg Landrum	5992c6fd23	- made the ResonanceMolSupplier really lazy, i.e. resonance structure enumeration is only carried out when the user asks for a structure or when the user explicitly request that calling the enumerate() member function. This makes object creation fast and enables calling getNumConjGrps(), getBondConjGrpIdx() and getAtomConjGrpIdx() member function without incurring in the cost of necessarily enumerating resonance structures - now bonds and atoms with do not belong to conjugated groups get a -1 index - added a few Python wrappers - added a few tests	2015-11-04 05:39:46 +01:00
Paolo Tosco	3d48ba72e1	- added threading support to ResonanceMolSupplier and relevant tests - added threading support to the ResonanceMolSupplier-enabled SubstructMatch() and relevant tests - modified/removed some code in O3AAlignMolecules.cpp which doesn't seem necessary anymore - modified Code/GraphMol/CMakeLists.txt to allow building on Windows	2015-11-01 23:01:34 +00:00
Paolo Tosco	f43677b978	- fixed a problem with thiocarboxylates/thiolates not being perceived as conjugated like their oxygen analogs - fixed an issue with large numbers of resonance structures exceeding the unsigned int allowance - implemented the uniquify feature properly - uniquify now defaults to false when using the ResonanceMolSupplier- enabled SubstructMatch() version - the concept of 'laziness' is now clearer - TODO: * remove some debugging info * move classes from .h to .cpp * SWIG wrappers * improve resonance structure sorting for degenerate resonance structures I will do all of the above ASAP	2015-10-21 20:06:53 +01:00
Paolo Tosco	eaa187b03d	- added ResonanceMolSupplier - added overloaded SubstructMatch() version supporting ResonanceMolSupplier - added relevant Python wrappers - added C++/Python tests	2015-10-04 23:21:28 +01:00
Greg Landrum	4b8caf2ceb	Fixes #409	2015-01-10 07:21:55 +01:00
Greg Landrum	34ab68ca2a	introduce QueryBond::QueryMatch, as with QueryAtoms; all tests passing; performance tests still needed	2014-05-07 05:29:25 +02:00
Greg Landrum	4a14a52674	Fixes #153	2013-11-15 06:47:18 +01:00
Greg Landrum	f3fbef45c5	update copyright statements	2010-09-26 17:04:37 +00:00
Greg Landrum	4db8233db6	sync with trunk	2010-09-10 05:12:41 +00:00
Greg Landrum	052ec66542	cleanups: remove x bit from headers and sources; remove a couple empty files from Code/GraphMol	2010-09-08 04:25:57 +00:00
Greg Landrum	0ce95829a5	cleanup deprecated args	2010-08-21 00:16:55 +00:00
Greg Landrum	f42f479d28	enabling infrastructure for making repeated recursive smarts queries run faster (like vector bindings). Though there is an addition to the smarts parser exposed here, I do not recommend using it in client code.	2010-06-03 10:02:15 +00:00
Greg Landrum	ec2c2042e8	remove the vflib usage code from Substruct area	2010-04-19 07:45:22 +00:00
Greg Landrum	30fe77b609	initial commit: passes all tests and seems to be faster than the original code	2009-02-09 16:19:31 +00:00
Greg Landrum	e450f5beeb	doc updates and some minor formatting changes	2007-10-24 16:37:37 +00:00
Greg Landrum	7cfa8cde0b	another substruct caching try	2007-09-23 06:56:13 +00:00
Greg Landrum	d5ffea669d	add support for chirality in substructure searches; this only is going to work in cases where CIP codes have been (i.e. can be) assigned to atoms.	2006-11-03 06:35:14 +00:00
Greg Landrum	88d596abca	get the AR_MOLGRAPH caching write with substructs; the current implementation introduces a core leak, so it is disabled by default	2006-10-19 05:24:05 +00:00
Greg Landrum	a5d7fc550a	try to get ChemTransforms checked in	2006-07-18 05:35:12 +00:00
Greg Landrum	75a79b6327	initial import	2006-05-06 22:20:08 +00:00

43 Commits