rdkit

mirror of https://github.com/rdkit/rdkit.git synced 2026-06-03 21:44:30 +08:00

Author	SHA1	Message	Date
greg landrum	e74e7b0a5a	release prep Release_2026_03_3	2026-05-29 09:26:02 +02:00
Greg Landrum	2c3c7d4257	add ability to block atoms/bonds from participating in tautomer zones (#9297 ) * add ability to block atoms/bonds from participating in tautomer zones * be more structured with the atom flag * response to review --------- Co-authored-by: = <=>	2026-05-29 05:54:49 +02:00
Brian Kelley	aa0190d2db	Adds MolToCDXMLBlock to FileParsers (#9291 ) * Adds MolToCDXMLBlock to FileParsers * Simplified code, removed warning * Fix C# wrapper for MolToCDX * Add C# test, fix cscode in swig * Fix typo in tests * Set default format to CDXML for MolToCDXML Co-authored-by: Greg Landrum <greg.landrum@gmail.com> * Add CDXML writer smoke tests --------- Co-authored-by: Greg Landrum <greg.landrum@gmail.com>	2026-05-29 05:54:28 +02:00
Dan Nealschneider	db53c39aed	synthon perf: replace sort+unique dedup with boost::unordered_flat_set (#9305 ) sortAndUniquifyToTry previously built a parallel vector of (index, string) pairs, sorted by string, erased duplicates, then rebuilt the original vector — O(N log N) with one heap allocation per candidate product. Replace with an erase-remove over a boost::unordered_flat_set<size_t> keyed on buildProductHash (boost::hash_combine over synthon IDs + reaction ID). Dedup is now O(N) average with no string allocations on the hot path. Also switch SearchResults::d_molNames from std::unordered_set<std::string> to boost::unordered_flat_set<std::string> for the same open-addressing cache locality benefit during mergeResults. Perf (42-rxn / 140B-product Freedom space, maxHits=3000, hitStart=1000, 9 queries; vanilla.log → 2unordered_flat_set.log): Benzene: 6.92s → 5.64s (−19%) Tolueneish: 6.19s → 5.07s (−18%) Acetaminophen: 4.50s → 3.63s (−19%) Allopurinol: 4.41s → 3.94s (−11%) Theophylline: 4.39s → 3.90s (−11%) Nicotine: 4.87s → 3.97s (−18%) Ciprofloxacin: 6.82s → 6.09s (−11%) Aspirin: 4.51s → 3.42s (−24%) Metoprolol: 5.11s → 4.07s (−20%) Total: 48.40s → 40.33s (−17%) Hit counts and MaxNumResults unchanged across all queries. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-29 05:54:13 +02:00
Marco Ballarotto	4e9e504079	Pandastools improvements (#9251 ) * Added automatic parsing functionality * Added documentation * Slightly changed check for gzip extension * Apply suggestions from code review Added small changes for readability Co-authored-by: Greg Landrum <greg.landrum@gmail.com> --------- Co-authored-by: Greg Landrum <greg.landrum@gmail.com>	2026-05-29 05:53:56 +02:00
Greg Landrum	0c19d01f81	swap to specific versions of other actions we use (#9308 )	2026-05-29 05:53:38 +02:00
David Cosgrove	8d1eb53d4c	Option to draw all bonds in symbolColour. (#9304 )	2026-05-29 05:53:20 +02:00
Steven Kearnes	2f9adc092f	Build: tag rdkitpython install rules with COMPONENT python (#9288 ) #9287 tagged these install rules with COMPONENT dev, which routes rdkitpython-config.cmake / rdkitpython-targets.cmake into the dev component alongside the python-agnostic C++ cmake config. That's correct progress over the prior Unspecified default, but `dev` is the wrong group: these files hardcode a single python version via configure_file substitution (find_dependency(Python3 3.X ...) and Boost component python3XX/numpy3XX), so they belong with the python wrappers — themselves python-version-specific — rather than with python-agnostic dev artifacts. Change the two affected COMPONENT tags from dev to python.	2026-05-29 05:52:54 +02:00
Clay Moore	e51844e7b1	Fix BFGS gradient-convergence denominator for negative energies (#9298 ) In Code/Numerics/Optimizer/BFGSOpt.h, the gradient-convergence check computed double term = std::max(funcVal * gradScale, 1.0); ... test /= term; if (test < gradTol) return 0; When funcVal (the current energy) is negative, funcVal * gradScale is negative and std::max clamps the denominator to 1.0. The convergence test therefore divides the gradient norm by 1 instead of by the intended \|E\| * gradScale, which over-tightens the criterion by a factor of \|funcVal * gradScale\| whenever \|funcVal * gradScale\| > 1. Negative energies are a normal mid-minimization state for force fields that include stabilizing terms (MMFF94, UFF with charges, AMBER-style potentials), so this affects realistic workloads: extra BFGS iterations or, occasionally, hitting MAXITS and returning the "too many iterations" status when convergence would otherwise have been reached. The fix is to use \|funcVal\| in the denominator, matching the pattern used three lines below ('std::max(fabs(pos[i]), 1.0)') and matching the intended interpretation as a magnitude. A new test case 'testBFGSOptimizationNegativeEnergy' in testOptimizer.cpp minimizes a 2D quadratic whose value is always negative along the convergence path and verifies the optimizer reaches the analytic minimum. git blame attributes the original line to commit `e08e0d16d` (Nov 2015), when the optimizer was restructured; the surrounding code does use absolute values, so this reads as an oversight rather than an intentional choice.	2026-05-29 05:52:36 +02:00
David Cosgrove	41fa15f4a1	Fix layout of reaction components in reaction drawing. (#9302 ) * Fix layout of reaction components in reaction drawing. * <sigh> --------- Co-authored-by: Greg Landrum <greg.landrum@gmail.com>	2026-05-29 05:52:18 +02:00
David Cosgrove	e6b549da36	Github9280 (#9300 ) * Fix setFontScale for maximum and minimum font sizes. * Fix other test. Add hashcodes.	2026-05-29 05:51:55 +02:00
Greg Landrum	54d6a1229c	add checked atom and bond iterators (#9290 ) * add checked iterators * support checked atom and bond iterators the idea here is to allow optional checking that the graph is not being modified while an iterator is active * ignore new member functions --------- Co-authored-by: Ric R <ricrogz@gmail.com>	2026-05-29 05:51:32 +02:00
Ricardo Rodriguez	86a317ba6e	Refactor to stop using iterator definitions in types.h (#9275 ) * clean up iterator defs in types.h * do not use auto for inline constexpr * restore undef max,min * restore types.h declarations	2026-05-29 05:51:18 +02:00
Steven Kearnes	19e5be1afb	Build: tag dev-only install rules with COMPONENT dev (#9287 ) Headers under Code/RDGeneral/hash and the generated cmake package config files (rdkit-config.cmake, rdkit-targets.cmake, rdkitpython-config.cmake, rdkitpython-targets.cmake) are currently installed without an explicit COMPONENT, so they default to "Unspecified" and cannot be cleanly separated from runtime artifacts by packagers that use cmake's per-component install (e.g. -DCMAKE_INSTALL_COMPONENT=dev). In conda-forge we split the package into librdkit (runtime) and librdkit-dev (headers + cmake config), with librdkit installing components "Unspecified base data runtime" and librdkit-dev installing the "dev" component. With the current upstream tagging, the hash headers and all cmake config files end up in librdkit instead of librdkit-dev, which both ships build-time artifacts in a runtime package and leaves librdkit-dev without the cmake config needed for find_package(RDKit) to work. This commit tags all five affected install() calls with COMPONENT dev so per-component installs work correctly. Default (non-component) installs are unaffected.	2026-05-29 05:51:01 +02:00
Ricardo Rodriguez	296c6ed88e	Fixes #9270 (#9272 ) * fix handling double bond stereo extraction * add tests * Update Code/GraphMol/Subset.cpp Co-authored-by: Greg Landrum <greg.landrum@gmail.com> --------- Co-authored-by: Greg Landrum <greg.landrum@gmail.com>	2026-05-29 05:50:32 +02:00
Steven Kearnes	e8e9dc2a13	PgSQL: preserve toolchain LDFLAGS on macOS (#9285 ) The previous `set(CMAKE_EXE_LINKER_FLAGS ...)` replaced the variable wholesale, which clobbers any toolchain-supplied linker flags. In particular, conda-forge's clang_osx-64 / clangxx_osx-64 packages set `-stdlib=libc++ -L${PREFIX}/lib -Wl,-rpath,${PREFIX}/lib` via `CMAKE_EXE_LINKER_FLAGS`. Losing those flags causes the postgres extension link to pick up the wrong libc++ and fail to resolve ABI-tagged symbols on libc++ 19+: [ 94%] Linking CXX executable rdkit.dylib Undefined symbols for architecture x86_64: "VTT for std::__1::basic_stringstream<...>" "vtable for std::__1::basic_stringbuf<...>" "vtable for std::__1::basic_stringstream<...>" "vtable for std::__1::basic_istringstream<...>" ld: symbol(s) not found for architecture x86_64 The missing symbols carry the `[abi:ne190107]` ABI tag introduced by libc++ 19+ — references that only resolve against the conda-forge libc++, not the system one the link was falling back to. Append to `CMAKE_EXE_LINKER_FLAGS` instead so the toolchain flags survive. The other rdkit `.dylib`s in the same build are linked via the standard cmake toolchain path and were never affected. Verified by building rdkit-postgresql on osx-64 + osx-arm64 via the conda-forge feedstock (https://github.com/conda-forge/rdkit-feedstock) with this fix applied as a downstream patch. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 05:47:16 +02:00
reza bagheri alashti	8120d418fb	refactor: improve readability and maintainability of AAP similarity code (#9277 ) * refactor: clean up AAP similarity logic and add type hints * Refactor AAP similarity implementation for clarity and maintainability * Refactor AAP similarity implementation for clarity and maintainability * ran yapf over the code	2026-05-29 05:47:03 +02:00
Gareth Jones	3c81879dc9	Adds some features to the C# SWIG wrappers (#9274 )	2026-05-29 05:46:47 +02:00
github-actions[bot]	0fb388e8fa	[bot] Update molecular templates header (#9269 ) Co-authored-by: github-actions[bot] <github-actions[bot]@noreply.github.com>	2026-05-29 05:46:35 +02:00
reza bagheri alashti	29773dda93	Docs: fix CosineSimilarity formula and clarify similarity metric names in BitOps.h (#9264 ) * Fix CosineSimilarity doc formula and clarify similarity metric names in BitOps.h * Fix CosineSimilarity formula in BitOps.h and adjust similarity docs	2026-05-29 05:46:20 +02:00
Dan Nealschneider	dae2061d85	CIPLabeler performance: Store vector of bonds (#9250 ) * CIPLabeler performance: Store vector of bonds CIPLabelling refers to bonds by index over and over again. This causes a measurable hit in performance in findConfigs() because we iterate over a bitset of "allowed" bonds. For very large molecules with many bonds, this can be a rate-limiting step! This affects many PDB-sized structures. 2J3N goes from 0.7s to 0.25s with this change. I had another example for which the findBondWithIdx() call was taking 500ms of a 700ms call (after the performance update in #9171 was implemented) * yikes, XXL reserve thanks, greg Co-authored-by: Greg Landrum <greg.landrum@gmail.com> --------- Co-authored-by: Greg Landrum <greg.landrum@gmail.com>	2026-05-29 05:45:58 +02:00
Dan Nealschneider	1a57191a8c	CIP labeller performance: Don't calculate auxiliary descriptors unnecessarily (#9171 ) * CIP labeller: Don't calculate auxiliary descriptors unnecessarily The first 3 rules (the constitutional rules) are pretty easy to understand. After rule 3, we need to calculate auxiliary stereo descriptors to break ties. However, we _were actually_ calculating auxiliary stereodescriptors for all centers! We should only need to calculate auxiliary stereocenters for sites that are needed to break ties. This cost time - it also caused errors if the auxiliary descriptors needed a graph expansion, because bonds in the digraph might be pointed in the wrong direction. Example case PDB ID 4AXM Before this commit, errored with "Could not calculate parity! Carrier mismatch" after 14s. After this commit, completes successfully in 0.036s. Labelled centers all match (for the centers that had labels in the failure case). Includes a test that I can imagine breaking with this optimization. The reference labels are from before this change * Ensure all "arms" of stereo bonds and atropisomer bonds are expanded For tetrahedral centers, ranking using the constitutional rules always expands as far as is needed (but no further). For SP2bond and atropisomers, if the first side is not resolvable, the second side is never visited. If the constitutional rules don't resolve a side, we need to label the auxiliary centers. It's important to label all auxiliary centers that _will_ be visited, so we need to know what centers will be visited. This commit updates the label() call in SP2 and Atropisomer bonds to always attempt to label both sides if using the constitutional rule set. The constitutional rules are cheap, and if they fail, we always go on to the full rule set. It is not a savings to skip the search on the second side if we're going to keep going anyway! Includes a test that reproduces Ricardo's example. This has no measurable effect on performance relative to the original solution * If any parts of the center have been seen, label it. I couldn't make an example hit this, but Ric is totally theoretically right * Greg's ranges suggestion #2 Co-authored-by: Greg Landrum <greg.landrum@gmail.com> * any_of for container search Co-authored-by: Greg Landrum <greg.landrum@gmail.com> --------- Co-authored-by: Greg Landrum <greg.landrum@gmail.com>	2026-05-29 05:45:28 +02:00
Raul Sofia	bc9e1145d1	Extended fix for #9101 (#9255 ) * fix extended boundary issue (3 mols) * clang pass * no change. retrigger CI for failed java test there's a failing java test that seems to be failing by chance rather than by changes, as it depends on rng. this is just to retrigger the CI pipeline to confirm this * no change. retrigger the CI (yet again) * raw strings and removed garbage collector	2026-05-29 05:45:06 +02:00
Emily Rhodes	709def06a7	Add optional default_val parameter to GetProp() (#9242 ) * SHARED-12256: Add test and change function. * SHARED-12256: Update to only wrapping changes. * SHARED-12256: Parameterize tests. * SHARED-12256: GetPropIfPresent changes. * Revert "SHARED-12256: GetPropIfPresent changes." This reverts commit `f598f8c161`. * SHARED-12256: Make default the keyword in the boost wrappings. * SHARED-12256: Overload function instead of using a sentinel. * SHARED-12256: Extend GetProp changes. * SHARED-12256: Add entry point for tests and fix tests.	2026-05-29 05:44:52 +02:00
Greg Landrum	4b714e973b	Support using iterators with MolSuppliers (#9230 ) * iterators for random-access MolSuppliers add optional caching to SDMolSupplier * add support to SmilesMolSupplier too There is a lot of duplicate code between the random-access suppliers that would be worth trying to remove but at the moment it looks like it would require multiple inheritance, and I think we want to avoid that * add input iterators for ForwardSDMolSupplier() * throw when calling begin() on a used supplier * switch to use the spaceship operator * init() should reset the mol cache * Make SDMolSupplier and SmilesMolSupplier safe for multi-threaded reads * add benchmarking * add TDTMolSupplier support improved testing add benchmarks for parallel iteration optional TBB support * better const handling, add reverse iterators doesn't look like const_iterator is possible since getting data from the underlyng supplier object is non-const * improve docs more usings add reverse iterator to TDTMolSupplier * tests only try execution::par when it is there * fix typo * more testing/demo * remove accidentally added files * review changes * add default ctors * disable a false-positive compiler warning it is stupid to have to do this --------- Co-authored-by: = <=>	2026-05-29 05:44:19 +02:00
Kevin Boyd	fab9b0fe4e	Add Getter functions to MMFF property python interface (#9254 )	2026-05-29 05:43:44 +02:00
greg landrum	42fe8c9525	prep for release Release_2026_03_2	2026-04-30 14:31:54 +02:00
Eloy Félix	61981921d1	Tautomer insensitive hash v2, E/Z and stereocenter-preservation (#9128 ) * Tautomer insensitive hash v2, E/Z and stereocenter-preservation * Preserve E/Z stereochemistry and stereocenters in TautomerHashv2 Simplify extension logic to better protect stereocenters connected via single bonds to aromatic systems. Preserve E/Z stereo on exocyclic double bonds to distinguish geometric isomers (e.g., E/Z hydrazones). * add helper function to remove duplicated code * Fix ring info and bond aromaticity handling in MolHash - Add fastFindRings check in TautomerHashv2 before ring queries - Set isAromatic consistent with bond type (true for AROMATIC bonds) - Fix inverted condition in RegioisomerHash * more consistent hashes regardless of stereo annotation	2026-04-30 14:22:19 +02:00
Greg Landrum	a36b53ec55	Ensure that StereoGroups don't have duplicate atoms or bonds (#9258 ) * check for duplicate atoms/bonds in StereoGroups * explicit handling of duplicate stereogroup atoms in CTAB and CXSMILES parsers --------- Co-authored-by: = <=>	2026-04-30 14:22:01 +02:00
Nic Zonta	dbd8cf35b4	If templates match, skip ring number check (#9217 ) * remove ring mathcing for templates * remove extra code * remove empty lines * fix build error	2026-04-30 14:21:23 +02:00
Kevin Boyd	0166c4aefc	Fix bug in inversion term for UFF, add finite difference checker. (#9228 ) * Fix copyright * Address review comments Removed finite diff from RDKit headers Used explicit coordinates	2026-04-30 14:20:51 +02:00
Greg Landrum	63a431ce72	Cleanup/get atoms and bonds (#9243 )	2026-04-30 14:20:28 +02:00
Ricardo Rodriguez	71d787c73d	make sorting more consistent (#9239 )	2026-04-30 14:20:13 +02:00
Chris Von Bargen	d2f44719ae	Add getSGroupDataLabels() to MolDraw2D_detail namespace (#9189 ) Adds a new function MolDraw2D_detail::getSGroupDataLabels() that returns the text and molecule-coordinate positions of DAT SGroup labels, using the same placement logic as the drawing code. This allows external renderers to display SGroup labels consistently with RDKit's placement. Refactors DrawMol::extractSGroupData() to call getSGroupDataLabels() internally, eliminating the duplicate FIELDDISP parsing and position computation logic. Closes #7829 Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-30 14:19:56 +02:00
Greg Landrum	9ba91cc97f	Add some std::ranges support (#9218 ) * initial ranges support for Atom/Bond iterators. needs more testing * support random access test sort more testing please * compiles on windows * fix size() more testing add some benchmarking * disable benchmarking code by default * do not allow modifying the graph through the iterators --------- Co-authored-by: = <=>	2026-04-30 14:19:28 +02:00
github-actions[bot]	9706585fd8	[bot] Update molecular templates header (#9234 ) Co-authored-by: github-actions[bot] <github-actions[bot]@noreply.github.com>	2026-04-30 14:19:01 +02:00
Yakov Pechersky	695c6a5e0c	Use index-order kekulization in MolToInchi (#9226 ) MolToInchi has called MolOps::Kekulize(*m, false) for years, but PR #9125 changed the default traversal to canonical=true. That pulls rankFragmentAtoms() and the canonicalization path into the InChI conversion even though the tested InChI outputs stay the same. Validation: - rdkit.Chem.UnitTestInchi passed before and after this change on upstream/master (18 tests, OK in both runs). - No InChI output drift was observed between stock and patched builds on Regress/Data/mols.1000.sdf.gz, rdkit/Chem/test_data/pubchem-hard-set.sdf.gz, or the atom-order regression molecules added in Code/GraphMol/catch_graphmol.cpp. Performance: - Release_2026_03_1 Python MolToInchi on Regress/Data/mols.1000.sdf.gz improved from 0.40712s to 0.38871s median (-4.52%). - Release_2026_03_1 rdinchi MolToInchi on the same dataset improved from 0.39755s to 0.37814s median (-4.88%). - Release_2026_03_1 standalone C++ MolToInchi on /tmp/mols.1000.sdf improved from 7.66775s to 7.03474s wall time (-8.26%), from 20.57B to 19.04B cycles (-7.46%), and from 121.78M to 114.05M cache misses (-6.35%).	2026-04-30 14:18:49 +02:00
Greg Landrum	0beb4bb900	mention AI tools in the contrib guidelines (#9224 ) * mention AI tools in the contrib guidelines * response to review --------- Co-authored-by: = <=>	2026-04-30 14:18:35 +02:00
David Cosgrove	20c08c93e2	Misplaced parentheses in shape code (#9222 ) * Move stray bracket. * Lots of consts. * Another bad bracket. * Response to review.	2026-04-30 14:18:22 +02:00
Brandon Novy	0552d94d5e	MolDraw2D: configurable legend position and vertical side legends (Issue #9023 ) (#9183 ) * Configurable legend position (Top/Left/Right/Bottom) and vertical text (GitHub #9023) - Add LegendPosition enum and legendPosition, legendVerticalText to MolDrawOptions - Support legend at Top, Left, Right, Bottom; vertical text for Left/Right - Python: MolDrawOptions.legendPosition, .legendVerticalText; LegendPosition enum - Python: MolToSVG() wrapper with legend/drawOptions; doc updates for MolToImage - JSON: legendPosition (string), legendVerticalText (bool) in draw options - C++ and Python tests; release note and Cartridge.md docs * MolDraw2D: legend gutter for horizontal side legends; vertical side height fit - Reserve horizontal gap between molecule and left/right horizontal legends (scale mol to molWidth-gutter, align toward legend strip). - Position horizontal side legend by measured text width from partition edge. - Vertical side legends: iterative scale so nmax_h+(n-1)gap fits panel. - Catch: long vertical side legend section. * Update legend-position tests and review-driven cleanup Use enum/default wording for legendPosition docs, move the lightweight Python test to Wrap, add regex-based placement checks (including horizontal side and vertical stacking), and refactor extractLegend helpers per style guidance. * Fix MolDraw2D legend edge cases * MolDraw2D: review follow-up (legend tests, bounds, DRY Top/Bottom) * Update no-FT legend test coords * Address PR review: document constants, remove release-note text, and simplify extra-padding logic	2026-04-30 14:18:05 +02:00
Yakov Pechersky	0cd2e5cbf2	Add more pyi patches, 2026-03 (#9214 )	2026-04-30 14:17:31 +02:00
Yakov Pechersky	0d886b9d08	Speed-up tautomer canonicalization, no API changes (#9134 ) * Speed up tautomer canonicalization by deferring on SSSR calc * Lazy kekulization for tautomer enumeration Defer kekulization of tautomers until they are actually needed for transform matching. This avoids creating kekulized copies for: 1. The initial tautomer (until first iteration) 2. New tautomers that may never be processed (if enumeration ends early) The Tautomer class now supports lazy initialization of the kekulized form via getKekulized() method. Performance improvement: ~7% additional speedup (total ~22-24% from baseline) * Use count-only substructure matching in tautomer scoring * Add SubstructMatchCount regression test * MolStandardize: reduce enumerate overhead * MolStandardize: avoid per-tautomer ring recomputation * Atom: cache PeriodicTable pointer in valence calcs * Atom: reuse PeriodicTable in getEffectiveAtomicNum * PeriodicTable: add atomic fast path for getTable * GraphMol: reduce ROMol copy reallocations * MolStandardize: use quickCopy for per-match product copies Use RWMol(kmol, true) in tautomer enumeration to avoid copying properties/bookmarks/conformers for each candidate. This reduces deep-copy overhead without changing chemistry. MolStandardize: pre-filter scoring patterns by element/connectivity For tautomer scoring, pre-compute which SubstructTerms are relevant for a given input molecule. Since tautomerization only moves H atoms and changes bond orders (never creates/destroys heavy-atom bonds), patterns requiring missing elements or connectivity can be skipped for all tautomers of that molecule. Two-stage filtering: 1. Element check: skip patterns requiring atoms not in the molecule 2. Connectivity check: skip patterns whose bond-order-agnostic structure doesn't match the input molecule's connectivity This reduces the number of VF2 substructure calls per tautomer from 12 to typically 3-5, depending on the molecule's composition. * MolStandardize: preserve molecule properties for canonical tautomer Copy molecule properties from the original input to the canonical tautomer result. Since quickCopy during enumeration skips d_props to avoid overhead, extended SMILES data like link nodes (LN) was lost. This restores them on the final result. * TautomerQuery: preserve molecule properties (e.g. link nodes) in tautomers TautomerQuery::fromMol() uses TautomerEnumerator::enumerate() which uses quickCopy for performance. This doesn't copy molecule properties like _molLinkNodes. Without this fix, XQMol output would lose link node extensions in the SMILES. Copy properties from the original query molecule to all enumerated tautomers before constructing the TautomerQuery. This preserves extended SMILES data without impacting enumeration performance. * MolStandardize: use parallel iteration and cache bond lookups Replace O(n) getAtomWithIdx/getBondWithIdx calls with parallel iteration over atom/bond ranges in canonicalizeInPlace and enumerate. Cache bond lookups in setTautomerStereoAndIsoHs to avoid repeated O(n) searches. * perf: add specialized matchers for simple tautomer scoring patterns Replace VF2 graph matching with O(n) loops for 6 simple patterns: - countDoubleOrAromaticBonds: C=O, N=O, P=O patterns - countMethyls: [CX4H3] methyl groups - countCarbonDoubleHetero: [C]=[/home/dcvuser/rdkit;Code/GraphMol/MolStandardize/Tautomer.h] aliphatic C=hetero - countAromaticCarbonExocyclicN: [c]=aromatic C=exocyclic N Complex patterns (benzoquinone, oxim, guanidine, aci-nitro) still use VF2. Combined with the pre-filtering optimization, this achieves ~3.7x speedup (~2500ms vs ~9300ms original) for tautomer canonicalization. * Fix tautomer canonicalize dropping conformers from quickCopy quickCopy (RWMol(mol, true)) skips conformers, so tautomer enumeration products lose 2D/3D coordinates. This causes InChI generation to omit the /b (double bond E/Z stereo) layer, since E/Z is derived from atomic coordinates. Fix: copy conformers from the original molecule onto the canonical tautomer after pickCanonical in TautomerEnumerator::canonicalize(). Tests: SMILES-based E/Z check in testTautomer.cpp, molblock-based conformer preservation check in catch_tests.cpp. add test on canonicalize losing stereo * add regression test for exocyclic C=C tautomer canonicalization The getTautomerStateKey() pre-filter (commit 2595ef748) can falsely deduplicate distinct tautomers when their atom-index-ordered state patterns happen to match, leading canonicalize() to pick the wrong canonical form for molecules with STEREOTRANS-pinned exocyclic C=C bonds after RemoveHs. Test verifies that O=C(CC1=CC2=CC=COC2)NC1=O canonicalizes to the exocyclic form O=C1CC(=CC2=CC=COC2)C(=O)N1, not the endocyclic form O=C1C=C(C=C2CC=COC2)C(=O)N1. Currently expected to FAIL until the state key dedup bug is fixed. * MolStandardize: expand tautomer connectivity SMARTS * MolStandardize: scope tautomer pattern enum * MolStandardize: trim tautomer pattern enum * MolStandardize: use symmetric ring scoring	2026-04-30 14:17:18 +02:00
Greg Landrum	351f8f378f	release prep (#9206 ) Release_2026_03_1	2026-03-27 10:37:45 +01:00
Nic Zonta	2096c7fe33	Enable templating for macrocycles (#9203 ) * parse templates as smarts * accept ring templates in SMARTS format * undo CLAUDE mistake * rename files * enable templating for macrocycles * enable macrocycle templating * Add test for macrocycle templating Tests that ring system templates are used only for macrocycles (rings with size > 8). The test verifies the exact threshold by generating coordinates with and without templates for rings of size 4-14. Addresses review feedback on PR #9203. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-27 06:25:20 +01:00
github-actions[bot]	c9dfd5a40e	[bot] Update molecular templates header (#9205 ) Co-authored-by: github-actions[bot] <github-actions[bot]@noreply.github.com>	2026-03-27 06:21:02 +01:00
David Cosgrove	5235f53910	Gaussian shape overlays (#9095 )	2026-03-26 21:53:54 +01:00
Paolo Tosco	adf060c881	- implement #9194 (#9197 ) - remove redundant #include - avoid unnecessary copy of match - expose SubstructMatchParams to JS MinimalLib - add JS SubstructMatchParams test Co-authored-by: ptosco <paolo.tosco@novartis.com>	2026-03-26 05:00:42 +01:00
EvaSnow	b56f3dc68a	Fix typo in _calculateBeta: check nb1 instead of nb2 twice (#9202 ) The non-terminal bond filter checked len(nb2) > 1 for both atoms, ignoring nb1 entirely. This could include bonds with a terminal begin-atom when computing dmax for torsion weights.	2026-03-26 04:57:04 +01:00
Ricardo Rodriguez	d90a73aa6b	Leak fixes for 2026.03.1 (#9198 ) * fix mols leaked in tests * own invariant generators * clean up MorganFeatureAtomInvGenerator patterns * address review suggestions	2026-03-25 05:56:26 +01:00
Greg Landrum	cacba34a47	simple substructure optimization (#9201 ) Co-authored-by: = <=>	2026-03-24 15:38:13 +01:00

1 2 3 4 5 ...

8465 Commits