rdkit

mirror of https://github.com/rdkit/rdkit.git synced 2026-06-03 21:44:30 +08:00

Author	SHA1	Message	Date
Yakov Pechersky	c6cabf4153	Speed-up tautomer canonicalization, no API changes (#9134 ) * Speed up tautomer canonicalization by deferring on SSSR calc * Lazy kekulization for tautomer enumeration Defer kekulization of tautomers until they are actually needed for transform matching. This avoids creating kekulized copies for: 1. The initial tautomer (until first iteration) 2. New tautomers that may never be processed (if enumeration ends early) The Tautomer class now supports lazy initialization of the kekulized form via getKekulized() method. Performance improvement: ~7% additional speedup (total ~22-24% from baseline) * Use count-only substructure matching in tautomer scoring * Add SubstructMatchCount regression test * MolStandardize: reduce enumerate overhead * MolStandardize: avoid per-tautomer ring recomputation * Atom: cache PeriodicTable pointer in valence calcs * Atom: reuse PeriodicTable in getEffectiveAtomicNum * PeriodicTable: add atomic fast path for getTable * GraphMol: reduce ROMol copy reallocations * MolStandardize: use quickCopy for per-match product copies Use RWMol(kmol, true) in tautomer enumeration to avoid copying properties/bookmarks/conformers for each candidate. This reduces deep-copy overhead without changing chemistry. MolStandardize: pre-filter scoring patterns by element/connectivity For tautomer scoring, pre-compute which SubstructTerms are relevant for a given input molecule. Since tautomerization only moves H atoms and changes bond orders (never creates/destroys heavy-atom bonds), patterns requiring missing elements or connectivity can be skipped for all tautomers of that molecule. Two-stage filtering: 1. Element check: skip patterns requiring atoms not in the molecule 2. Connectivity check: skip patterns whose bond-order-agnostic structure doesn't match the input molecule's connectivity This reduces the number of VF2 substructure calls per tautomer from 12 to typically 3-5, depending on the molecule's composition. * MolStandardize: preserve molecule properties for canonical tautomer Copy molecule properties from the original input to the canonical tautomer result. Since quickCopy during enumeration skips d_props to avoid overhead, extended SMILES data like link nodes (LN) was lost. This restores them on the final result. * TautomerQuery: preserve molecule properties (e.g. link nodes) in tautomers TautomerQuery::fromMol() uses TautomerEnumerator::enumerate() which uses quickCopy for performance. This doesn't copy molecule properties like _molLinkNodes. Without this fix, XQMol output would lose link node extensions in the SMILES. Copy properties from the original query molecule to all enumerated tautomers before constructing the TautomerQuery. This preserves extended SMILES data without impacting enumeration performance. * MolStandardize: use parallel iteration and cache bond lookups Replace O(n) getAtomWithIdx/getBondWithIdx calls with parallel iteration over atom/bond ranges in canonicalizeInPlace and enumerate. Cache bond lookups in setTautomerStereoAndIsoHs to avoid repeated O(n) searches. * perf: add specialized matchers for simple tautomer scoring patterns Replace VF2 graph matching with O(n) loops for 6 simple patterns: - countDoubleOrAromaticBonds: C=O, N=O, P=O patterns - countMethyls: [CX4H3] methyl groups - countCarbonDoubleHetero: [C]=[/home/dcvuser/rdkit;Code/GraphMol/MolStandardize/Tautomer.h] aliphatic C=hetero - countAromaticCarbonExocyclicN: [c]=aromatic C=exocyclic N Complex patterns (benzoquinone, oxim, guanidine, aci-nitro) still use VF2. Combined with the pre-filtering optimization, this achieves ~3.7x speedup (~2500ms vs ~9300ms original) for tautomer canonicalization. * Fix tautomer canonicalize dropping conformers from quickCopy quickCopy (RWMol(mol, true)) skips conformers, so tautomer enumeration products lose 2D/3D coordinates. This causes InChI generation to omit the /b (double bond E/Z stereo) layer, since E/Z is derived from atomic coordinates. Fix: copy conformers from the original molecule onto the canonical tautomer after pickCanonical in TautomerEnumerator::canonicalize(). Tests: SMILES-based E/Z check in testTautomer.cpp, molblock-based conformer preservation check in catch_tests.cpp. add test on canonicalize losing stereo * add regression test for exocyclic C=C tautomer canonicalization The getTautomerStateKey() pre-filter (commit 2595ef748) can falsely deduplicate distinct tautomers when their atom-index-ordered state patterns happen to match, leading canonicalize() to pick the wrong canonical form for molecules with STEREOTRANS-pinned exocyclic C=C bonds after RemoveHs. Test verifies that O=C(CC1=CC2=CC=COC2)NC1=O canonicalizes to the exocyclic form O=C1CC(=CC2=CC=COC2)C(=O)N1, not the endocyclic form O=C1C=C(C=C2CC=COC2)C(=O)N1. Currently expected to FAIL until the state key dedup bug is fixed. * MolStandardize: expand tautomer connectivity SMARTS * MolStandardize: scope tautomer pattern enum * MolStandardize: trim tautomer pattern enum * MolStandardize: use symmetric ring scoring	2026-03-31 06:42:40 +02:00
Ricardo Rodriguez	9eaa193186	Merge simple AND queries onto atoms. (#8830 ) * duplicate parser code * regenerate smarts.tab files * update test * add release note * restore pregenerated header * Suggested changes (#23) * add a generic flags interface to atoms and bonds * suggested changes --------- Co-authored-by: Greg Landrum <greg.landrum@gmail.com>	2025-10-09 04:38:42 +02:00
Greg Landrum	9d4afd0e08	add clearPropertyCache() (#8533 )	2025-05-18 08:14:05 +02:00
Greg Landrum	fa048eacc5	Replace GetImplicitValence() and GetExplicitValence() with GetValence() (#7926 )	2025-01-28 21:09:03 +01:00
Brian Kelley	0909d753b8	Fixes #7873 (#7885 ) * Fixes #7873 * Resolve MonomerInfo class for deletion * Add regression test for setMonomerInfo --------- Co-authored-by: Greg Landrum <greg.landrum@gmail.com>	2024-10-11 06:42:52 +02:00
Greg Landrum	7d2598267a	Fixes #7689 (#7851 )	2024-09-26 19:22:26 +02:00
Greg Landrum	724716b2c6	Switch to isoelectronic valence model (#7491 ) * change valence model to use isolobal analogy Remove support for five-coordinate C+ and, by analogy, five-coordinate N+2 Removes support for charge states that take atoms past the end of the periodic table i.e. [Lv-4] is no longer supported * update the tests for that * remove valence state of 6 for Al * fix representation of phosphate in the mol2 parser this is a correction of what was done during #5973 * cleanup the exceptions for P, S, As, and Se * drop valence states: Si 6, P 7, As 7 * a couple of additional changes from #7397 * update java tests * fix an inconsistency: Rb now supports valence -1 * documentation * - replace operator[] with at() for bounds check - extract some code into a function to avoid duplication - use TAB as separator throughout in the periodic table data for consistency * removing the .at() usage We know that these vectors aren't empty, so there's no need for the bounds check. --------- Co-authored-by: ptosco <paolo.tosco@novartis.com>	2024-06-25 15:38:49 +02:00
Ricardo Rodriguez	73e91a6344	Fixes #7318 (#7319 ) * fix hybridization for atoms with outgoing dative bonds * expose and wrap C++ numPiElectrons * deprecate AtomPairs.Utils.NumPiElectrons * add & update tests * fix draw2d test * update expected hash * add hybridization test * move numPiElectrons to Atom.h * take reference instead of ptr	2024-04-03 15:34:37 +02:00
Greg Landrum	6664dd7fe4	Some modernization of core GraphMol classes (#7228 ) * simple modernization * more * done with RWMol for this pass * the ROMol.cpp variant * Atom * minor change to bond * simplify Conformer * monomerinfo, queryatom, querybond queryatom and querybond cpp files still need to be done * typos * revert a dumb change * suggestion from review	2024-03-17 06:04:04 +01:00
Greg Landrum	8b8ae1561d	Improve output of debugMol (#7172 ) * better bond output with debugMol * update atom output * fix typo in the StereoGroup output	2024-02-25 17:46:21 +01:00
cdvonbargen	45c88e4e1e	Add Atom::hasValenceViolation (Take 2) (#7030 ) * Refactor Atom.cpp to create a hasValenceViolation method that uses existing valence checking code * work without exceptions * get rid of the snake_case * put free functions in an unnamed namespace --------- Co-authored-by: greg landrum <greg.landrum@gmail.com>	2024-01-18 05:06:57 +01:00
Greg Landrum	a7c781c107	Some small cleanups from the UGM Hackathon (#6744 ) * move definition of a couple global constants from a .h to a .cpp * careful removal of some redundant atom PRECONDITIONS * careful remove of some redundant ROMol PRECONDITIONS a bit of additional cleanup * optimization masquerading as modernization * some more tidying * a bit more atom cleanup * change in response to review	2023-10-05 06:13:18 +02:00
Franz Waibl	c99115e4b1	Remove check for ring information from Atom::Match (#6063 ) In high-symmetry cases where the symmetric SSSR does not find all possible rings, substructure searches can fail because of this check. Removing it fixes those cases, but is likely to decrease the performance of substructure matching. Also, adds a unit test where the old code fails Co-authored-by: Franz Waibl <waiblfranz@gmail.com>	2023-02-08 04:30:05 +01:00
Greg Landrum	fd44d72fb7	Fixes #5849 (#5861 ) * Fixes #5849 This may not be the best fix since it adds another step to canonicalization * more test cases * update docs	2022-12-28 20:10:13 +00:00
Greg Landrum	b817f29eb8	extend the allowed valences of the alkali earths (#5786 ) make it possible to have preferred and arbitrary valence states (I thought this already worked)	2022-11-25 04:50:36 +01:00
Greg Landrum	1f4584b2ca	run clang_format (#5676 )	2022-11-01 04:14:26 +01:00
Greg Landrum	cd74dc2207	Initial support for non-tetrahedral stereochemistry (#5084 ) * very basics: actually parsing the new atom stereochem features * add some input verification for the chiral permutations * fix a typo add quadruple bond SMILES/SMARTS extension * add forgotten files * patch from Roger * add Roger's parsing examples * typo * new tests * adjusted version of next PR from Roger: - add SP2D hybridization for square planar (this may change) - some modernizationof Chirality.cpp - stop using < HybridizationType in Chirality.cpp (should probably do this elsewhere too) - improved handling of hybridization assignment for new stereochem - handle new stereo/hybridization in UFF - tests for the above * perception of non-tetrahedral stereo from 3D (from Roger S) Basic testing of SP and TB based on opensmiles docs * potential fixes for octahedral assignment more tests * docs update need way more! * map the TH tags directly to @ tags * very basics of SMILES writing this does not work with anything that changes the permutation order like canonicalization or writing things in rings. * start to support the getChiralAcross API * more testing * consistency * add hasNonTetrahedralStereo() and getIdealAngleBetweenLigands() * assignStereochemistry should only remove non-tetrahedral stereo * re-simplify those tests * cleanup matrix stream output * initial pass at supporting nontet stereo in distgeom * backup * start on the reference docs * TBP reference * first pass at Oh finished * update SP section * more doc updates * fix a typo * add param to not remove Hs connected to non-tetrahedral atoms * VERY basic coord generation for square planar * TBP basics * basic OH depiction * start testing missing ligands allow non-tet stereo in rings (ugly, but correct) * add new TBP functions from Roger * update depiction code for new API * backup, the new tests work so far * Finish the TB tests * OH tests pass too * cleanup * first pass at getting correct SMILES with reordering need way more testing than this * ensure permutation 0 is correctly preserved * some progress towards adding non-tetrahedral stereo to StereoInfo * doc update * add non-tet chiral classes to python wrappers * make sure removeAllHs also gets neighbors of non-tetrahedral centers more testing * a bit of depictor cleanup * make the assignment from 3D more tolerant more testing * improve the bulk testing * cleanup * remove a bit of redundant code * ensure we don't write bogus permutation values to SMILES * fix some rebase problems * allow assignStereochemistryFrom3D() to be called without sanitization * allow disabling the non-tetrahedral stereo when it's not explicit * get that working on windows too	2022-05-20 09:07:16 +02:00
Eisuke Kawashima	ba6d8e0d3b	clang-tidy: readability-simplify-boolean-expr (#4639 )	2022-03-17 13:50:50 +01:00
Brian Kelley	f326de01c0	Move isEarlyAtom to a table to reduce lock contention in getPeriodicTable (#4980 ) * Move isEarlyAtom to a table format to reduce lock contention in getPeriodicTable * Fix He early atom status Co-authored-by: Greg Landrum <greg.landrum@gmail.com> Co-authored-by: Greg Landrum <greg.landrum@gmail.com>	2022-02-03 16:03:17 +01:00
Greg Landrum	8390dfd181	Fixes #4785 : aromatic bonds no longer set aromatic flags on atoms (#4806 ) * preliminary * all tests pass * cleanup * more testing! * we do still want to raise errors for aromatic atoms not in rings fix one missed change for mol blocks * update expected results for psql test	2021-12-17 10:26:59 +01:00
Greg Landrum	85608555fe	add ROMol::atomNeighbors() and ROMol::atomBonds() (#4573 ) * add ROMol::atomBonds() and ROMol::atomNeighbors() methods * remove some warnings * start using the new code * add default for those template params * some more applications * get the SWIG builds working * get rid of extraneous ref * remove extraneous comments	2021-10-02 07:28:24 +02:00
Greg Landrum	69b143edd0	Swap from RDUNUSED_PARAM to unnamed parameters (#4433 ) * cleanup * more cleanup	2021-08-24 17:19:46 -04:00
Greg Landrum	4c2a580ad1	Fixes github #4311 (#4312 ) * a bit of simple refactoring * Fixes #4311 - adds getValenceContrib() to QueryBond - adds hasBondTypeQuery() and hasComplexBondTypeQuery() to QueryOps namespace - atoms with complex bond type queries now have explict and implicit valences of 0 - adds tests for the above * add a test	2021-07-09 15:06:54 +02:00
Greg Landrum	bba71631b8	fix problem with H+ caused by #3473 (#3503 ) * fix problem with H+ caused by #3473 * changes in response to review	2020-10-19 12:45:00 -04:00
Greg Landrum	acf318c188	Fixes #3470 (#3473 )	2020-10-11 08:44:52 -04:00
Eisuke Kawashima	75f03412ef	Modernize deprecated header inclusion (#3137 )	2020-05-04 10:40:57 +02:00
Greg Landrum	9991c5247a	cleanup of the SMILES/SMARTS parsing and writing code (#2912 ) * first cleanup * next round of changes. all tests pass * Fixes #2909 * Fixes #2910 * further cleanup * some cleanup/refactoring of the Dict class * remove now extraneous calls to hasProp() before clearProp() * minor refactoring of RDProps.h * Switch from using our own version of round() to std::round() * replace some boost::math stuff with the equivalents from std:: * cleanups in SmartsWrite * refactor out a bunch of duplicated code * fix an instance of undefined behavior * changes in response to review	2020-01-29 15:13:39 +01:00
Greg Landrum	d41752d558	run clang-tidy with readability-braces-around-statements (#2899 ) * run clang-tidy with readability-braces-around-statements clang-format the results clean up all the parts that clang-tidy-8 broke * fix problem on windows	2020-01-25 14:19:32 +01:00
Greg Landrum	853c24d11c	Fixes #2775 (#2776 )	2019-11-14 16:58:30 +01:00
Greg Landrum	253f172353	Allow identification of chemistry problems (#2587 ) * add AtomValenceException * refactor a bit and add KekulizeException * add copy ctor and copy() method * add detectChemistryProblems * add getType() method want to be able to get the type of the exception without requiring doing a bunch of dynamic casts * first pass at exception inheritance/translation needs some cleanup and expansion, but this does pass all tests. * cleanup and finish the python wrappers for the new exceptions * make sure things are truly polymorphic * wrap shared_ptrs of the new exception types * expose DetectChemistryProblems() * get the java wrappers building again * transfer those changes to the c# wrapper * add detectChemistryProblems() and deal with the fun fun exception inheritance things that ensue * response to review	2019-08-28 14:15:55 -07:00
Greg Landrum	7ffd863c9b	A collection of bug fixes (#2608 ) * Fixes #2602 * Fixes #2605 * Remove vestigial isEarlyAtom() definition in Kekulize.cpp * Fixes #2606 * Fixes #2607 adds allowed valence 2 for Sn and Pb * Fixes #2610 * update in response to review	2019-08-15 04:53:23 +02:00
Greg Landrum	d8c49e6dab	Code cleanups from PVS/Studio (#2531 ) * first round of cleanups based on PVS-studio suggestions * a couple more * a few more cleanups * another round of cleanups * undo one of those cleanups we want the integer rounding behavior here * add a comment to make that clear * Fix for filter catalog PRECONDITION redundancy	2019-07-13 07:25:37 +02:00
Greg Landrum	3ce2016039	Fixes #2452 (#2507 )	2019-06-24 23:07:19 -04:00
Greg Landrum	334b1558bc	Fixes #2258 (#2286 )	2019-02-22 07:30:31 -07:00
Greg Landrum	915cf08faa	run clang-format with c++-11 style over that	2017-04-22 17:19:10 +02:00
Greg Landrum	7c0bb0b743	clang-tidy output	2017-04-22 17:09:24 +02:00
Greg Landrum	d3ad7d2770	Fixes #1387 (#1392 ) * backup * Fixes #1387 this passes the bug tests, but needs the full tests run * all tests pass * remove some droppings left from an earlier attempt at a fix * remove some additional printing * cleanup	2017-04-07 11:57:43 -04:00
Brian Kelley	ddf7c73b50	Adds Atom atom map and rlabel apis (#1004 ) * Adds Atom atom map and rlabel apis * Moves RLabels to their own namespace, adds other properties. * Removes namespaces, liberally adds Atom to function names. * move detail::computedPropName to RDKit::detail::computedPropName	2016-08-11 04:46:41 +02:00
Brian Kelley	2debdfde0d	Adds RDAny (smaller generic holder) Updates all used dictionaries (#896 ) * Adds RDAny (smaller generic holder) Updates all used dictionaries This is an API compliant version of the current rdany system, but uses a lot less memory in practice. * Removes code duplication * Converts CHECK_INVARIANT to TEST_ASSERT * Fixes DoubleTag issue * Adds Bool to DoubleMagic implementation * Removes reference to property pickler	2016-05-29 17:04:21 +01:00
Greg Landrum	e08e0d16d8	first pass, using google style	2015-11-14 14:58:11 +01:00
Greg Landrum	5618819c64	merge #641	2015-11-14 05:03:24 +01:00
Paolo Tosco	deb01fa717	Merge remote-tracking branch 'upstream/master'	2015-10-18 22:03:19 +01:00
Brian Kelley	5f59333a56	Silences unused parameters	2015-10-18 14:02:29 -04:00
Paolo Tosco	eaa187b03d	- added ResonanceMolSupplier - added overloaded SubstructMatch() version supporting ResonanceMolSupplier - added relevant Python wrappers - added C++/Python tests	2015-10-04 23:21:28 +01:00
Brian Kelley	d50fd264f6	Change to const std::string & API	2015-09-25 15:15:21 -04:00
Paolo Tosco	5dfbecd6d0	- fixed missed kekulization of aromatic carbocations such as cyclopropenyl and tropylium	2015-07-17 01:00:34 +01:00
Greg Landrum	797db2fa82	remove Atom::setMass()	2015-03-22 17:57:04 +01:00
Greg Landrum	37673af15a	remove dativeFlag from atoms	2015-03-22 17:48:06 +01:00
Brian Kelley	95a92282d1	Dictionary access is saniztized and optimized. o rdkit gains a RDKit::common_properties namespace that contains common string value properties o Dict.h and below gain getPropIfPresent that attempts to retrieve a property and returns true/false on success or failure. This is used to optimize access. o rdkit learns how to pass property keys by reference, not value. A new namespace has been added to RDKit, common_properties that contains the std::string values for commonly used properties. This helps to avoid typos in string values but also avoids a creation of std::strings from character values. All accessors (has/get/clear and getPropIfPresent) now pass the key by reference. Additionally, getPropIfPresent removes the double lookup of hasProp/getProp which can be a significant speedup in the smiles and smarts parsers (10-20%)	2015-01-15 12:23:29 -05:00
Greg Landrum	5b7b3b3d3d	fix problem with Atom->needsUpdatePropertyCache() and noImplicit	2014-12-29 08:00:40 +00:00

1 2

94 Commits