8476 Commits

Author SHA1 Message Date
David Cosgrove
9f551aedbe Multi conf gaussian shape (#9265)
* First import of GaussianShape.

* Tidying.

* Custom features.

* Optimise.

* Optimise.

* Return 3 scores rather than 2 including combo score.

* Rename useFeatures to useColors.

* Python wrappers.

* Python tests.

* Take out big test.

* Add new start mode, as PubChem does it.

* Doh!

* Fix MolTransforms eigenvalue return.

* Two cycle optimisation, mostly working.

* Take out bestSoFar score from SCA.

* Take out DTYPE.

* Tidy out redundant variables.

* Optimisation in 2 parts.

* More fiddling in pursuit of speed.

* Update Python wrapper.

* Tweak.

* Atom subsets and different radii.

* Fix test.

* Revert pubchem_shape's test.cpp.

* Serialize ShapeInput.

* Trigger build

* Remove pointers to std::arrays in ShapeInput.

* ShapeInput virtual d'tor.

* Precondition - ShapeInput needs a molecule with at least 1 conformer.

* Rename ShapeInput::d_centroid to ShapeInput::d_canonTrans.

* Fix normalization bugs.

* Select start mode using moments of inertia rather than eigenvalues of canonical transformation.

* Include color features in moments of inertia.

* Smidge faster.

* Tversky similarity.

* Tidy tests.

* Tests working on Linux.

* Revert force of right handed axes in MolTransforms::computePrincipalAxesAndMomentsFromGyrationMatrix replacing with a comment in the code.

* Response to review.

* Sneaky allCarbon bug.

* add multithreaded test

* Response to review.

* Doh! Don't recalculate normalization after every transformation.

* Re-instate d_normalizationOK.

* Re-name functions for fetching canonical transformations.

* Separate alpha from coords.

* MultiConf works with single conf extraction.

* Extract all conformations.
Max and best similarities.

* Renames d_currConformer to d_activeShape.

* Update shapeToMol.

* Update shapeToMol.

* Changes from synthon shape searching.

* Fix normalization of multiple confs.

* Update Python wrappers.

* Fix shape merge.

* Improve bestSimilarity.

* Fix python wrapper.

* Pull in changes from SynthonShapeSearch:
make pruneShapes public.
function to negate Alpha values.

* clang-tidy suggestions.

* clang-tidy suggestions.

* Bug in quaternion gradients - we now have only 3 coordinates.

* Tidy tests.

* Mac result slightly different.

* Multi conformer molecule alignment.

* Optionally return raw overlap volumes in score functions.

* Python wrappers for raw overlap volumes.

* Update Python wrapper ShapeInputOptions.

* Tidy for PR.

* Extra include file.

* Extra library

* Tidy forward declarations.

* Don't prune if threshold < 0.0.

* Windows exporty thing.

* Check SMILES on merge of ShapeInputs.

* PRECONDITION of SMILES on merge of ShapeInputs.

* Response to review - rename some functions.

* change how overlapVols is passed
add a test for it

* API suggestions

* Response to review.

* Remove debugging writes.

* Fix Python wrappers.

---------

Co-authored-by: David Cosgrove <david@cozchemix.co.uk>
Co-authored-by: greg landrum <greg.landrum@gmail.com>
2026-06-03 06:09:09 +02:00
Nic Zonta
b854399558 Spiro flipping (#9204)
* add flipping of spiro rings as a way to solve clashes

* remove extra function

* add test file

* update coordgen parameters to allow for bond flipping

* fix failing tests

* Update Code/GraphMol/Depictor/EmbeddedFrag.h

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

* Update Code/GraphMol/Depictor/EmbeddedFrag.cpp

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

* Update Code/GraphMol/Depictor/EmbeddedFrag.cpp

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

* Update Code/GraphMol/Depictor/EmbeddedFrag.cpp

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

* [bot] Update molecular templates header (#9234)

Co-authored-by: github-actions[bot] <github-actions[bot]@noreply.github.com>

* Add some std::ranges support (#9218)

* initial ranges support for Atom/Bond iterators.
needs more testing

* support random access
test sort

more testing please

* compiles on windows

* fix size()
more testing
add some benchmarking

* disable benchmarking code by default

* do not allow modifying the graph through the iterators

---------

Co-authored-by: = <=>

* mention AI tools in the contrib guidelines (#9224)

* mention AI tools in the contrib guidelines

* response to review

---------

Co-authored-by: = <=>

* Add getSGroupDataLabels() to MolDraw2D_detail namespace (#9189)

Adds a new function MolDraw2D_detail::getSGroupDataLabels() that returns
the text and molecule-coordinate positions of DAT SGroup labels, using
the same placement logic as the drawing code. This allows external
renderers to display SGroup labels consistently with RDKit's placement.

Refactors DrawMol::extractSGroupData() to call getSGroupDataLabels()
internally, eliminating the duplicate FIELDDISP parsing and position
computation logic.

Closes #7829

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* MolDraw2D: configurable legend position and vertical side legends (Issue #9023) (#9183)

* Configurable legend position (Top/Left/Right/Bottom) and vertical text (GitHub #9023)

- Add LegendPosition enum and legendPosition, legendVerticalText to MolDrawOptions
- Support legend at Top, Left, Right, Bottom; vertical text for Left/Right
- Python: MolDrawOptions.legendPosition, .legendVerticalText; LegendPosition enum
- Python: MolToSVG() wrapper with legend/drawOptions; doc updates for MolToImage
- JSON: legendPosition (string), legendVerticalText (bool) in draw options
- C++ and Python tests; release note and Cartridge.md docs

* MolDraw2D: legend gutter for horizontal side legends; vertical side height fit

- Reserve horizontal gap between molecule and left/right horizontal legends
  (scale mol to molWidth-gutter, align toward legend strip).
- Position horizontal side legend by measured text width from partition edge.
- Vertical side legends: iterative scale so n*max_h+(n-1)*gap fits panel.
- Catch: long vertical side legend section.

* Update legend-position tests and review-driven cleanup

Use enum/default wording for legendPosition docs, move the lightweight Python test to Wrap, add regex-based placement checks (including horizontal side and vertical stacking), and refactor extractLegend helpers per style guidance.

* Fix MolDraw2D legend edge cases

* MolDraw2D: review follow-up (legend tests, bounds, DRY Top/Bottom)

* Update no-FT legend test coords

* Address PR review: document constants, remove release-note text, and simplify extra-padding logic

* make sorting more consistent (#9239)

* Cleanup/get atoms and bonds (#9243)

* Fix bug in inversion term for UFF, add finite difference checker. (#9228)

* Fix copyright

* Address review comments

Removed finite diff from RDKit headers

Used explicit coordinates

* If templates match, skip ring number check (#9217)

* remove ring mathcing for templates

* remove extra code

* remove empty lines

* fix build error

* Tautomer insensitive hash v2, E/Z and stereocenter-preservation (#9128)

* Tautomer insensitive hash v2, E/Z and stereocenter-preservation

* Preserve E/Z stereochemistry and stereocenters in TautomerHashv2

Simplify extension logic to better protect stereocenters connected via
single bonds to aromatic systems. Preserve E/Z stereo on exocyclic
double bonds to distinguish geometric isomers (e.g., E/Z hydrazones).

* add helper function to remove duplicated code

* Fix ring info and bond aromaticity handling in MolHash

- Add fastFindRings check in TautomerHashv2 before ring queries
- Set isAromatic consistent with bond type (true for AROMATIC bonds)
- Fix inverted condition in RegioisomerHash

* more consistent hashes regardless of stereo annotation

* Ensure that StereoGroups don't have duplicate atoms or bonds (#9258)

* check for duplicate atoms/bonds in StereoGroups

* explicit handling of duplicate stereogroup atoms in CTAB and CXSMILES parsers

---------

Co-authored-by: = <=>

* Add Getter functions to MMFF property python interface (#9254)

* Support using iterators with MolSuppliers (#9230)

* iterators for random-access MolSuppliers
add optional caching to SDMolSupplier

* add support to SmilesMolSupplier too
There is a lot of duplicate code between the random-access suppliers that would be worth trying to remove
but at the moment it looks like it would require multiple inheritance, and I think we want to avoid that

* add input iterators for ForwardSDMolSupplier()

* throw when calling begin() on a used supplier

* switch to use the spaceship operator

* init() should reset the mol cache

* Make SDMolSupplier and SmilesMolSupplier safe for multi-threaded reads

* add benchmarking

* add TDTMolSupplier support
improved testing
add benchmarks for parallel iteration
optional TBB support

* better const handling, add reverse iterators

doesn't look like const_iterator is possible since getting data from the underlyng supplier object is non-const

* improve docs
more usings
add reverse iterator to TDTMolSupplier

* tests only try execution::par when it is there

* fix typo

* more testing/demo

* remove accidentally added files

* review changes

* add default ctors

* disable a false-positive compiler warning
it is stupid to have to do this

---------

Co-authored-by: = <=>

* Pandastools improvements (#9251)

* Added automatic parsing functionality

* Added documentation

* Slightly changed check for gzip extension

* Apply suggestions from code review

Added small changes for readability

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

---------

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

* Add optional default_val parameter to GetProp()  (#9242)

* SHARED-12256: Add test and change function.

* SHARED-12256: Update to only wrapping changes.

* SHARED-12256: Parameterize tests.

* SHARED-12256: GetPropIfPresent changes.

* Revert "SHARED-12256: GetPropIfPresent changes."

This reverts commit f598f8c161.

* SHARED-12256: Make default the keyword in the boost wrappings.

* SHARED-12256: Overload function instead of using a sentinel.

* SHARED-12256: Extend GetProp changes.

* SHARED-12256: Add entry point for tests and fix tests.

* Extended fix for #9101 (#9255)

* fix extended boundary issue (3 mols)

* clang pass

* no change. retrigger CI for failed java test

there's a failing java test that seems to be failing by chance rather than by changes, as it depends on rng. this is just to retrigger the CI pipeline to confirm this

* no change. retrigger the CI (yet again)

* raw strings and removed garbage collector

* CIP labeller performance: Don't calculate auxiliary descriptors unnecessarily (#9171)

* CIP labeller: Don't calculate auxiliary descriptors unnecessarily

The first 3 rules (the constitutional rules) are pretty easy
to understand. After rule 3, we need to calculate auxiliary
stereo descriptors to break ties.

However, we _were actually_ calculating auxiliary stereodescriptors
for all centers! We should only need to calculate auxiliary
stereocenters for sites that are needed to break ties.

This cost time - it also caused errors if the auxiliary descriptors
needed a graph expansion, because bonds in the digraph might be
pointed in the wrong direction.

Example case PDB ID 4AXM
Before this commit, errored with "Could not calculate parity! Carrier mismatch"
after 14s. After this commit, completes successfully in 0.036s.
Labelled centers all match (for the centers that had labels in
the failure case).

Includes a test that I can imagine breaking with this optimization.
The reference labels are from before this change

* Ensure all "arms" of stereo bonds and atropisomer bonds are expanded

For tetrahedral centers, ranking using the constitutional rules
always expands as far as is needed (but no further). For SP2bond
and atropisomers, if the first side is not resolvable, the
second side is never visited.

If the constitutional rules don't resolve a side, we need to
label the auxiliary centers. It's important to label all
auxiliary centers that _will_ be visited, so we need to know
what centers will be visited.

This commit updates the label() call in SP2 and Atropisomer
bonds to always attempt to label both sides if using the
constitutional rule set.

The constitutional rules are cheap, and if they fail, we
always go on to the full rule set. It is not a savings to skip
the search on the second side if we're going to keep going
anyway!

Includes a test that reproduces Ricardo's example.

This has no measurable effect on performance relative to the
original solution

* If any parts of the center have been seen, label it.

I couldn't make an example hit this, but Ric is totally
theoretically right

* Greg's ranges suggestion #2

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

* any_of for container search

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

---------

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

* CIPLabeler performance: Store vector of bonds (#9250)

* CIPLabeler performance: Store vector of bonds

CIPLabelling refers to bonds by index over and over again. This
causes a measurable hit in performance in findConfigs() because
we iterate over a bitset of "allowed" bonds. For very large
molecules with many bonds, this can be a rate-limiting step!

This affects many PDB-sized structures.

2J3N goes from 0.7s to 0.25s with this change.

I had another example for which the findBondWithIdx() call was
taking 500ms of a 700ms call (after the performance update
in #9171 was implemented)

* yikes, XXL reserve

thanks, greg

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

---------

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

* Address PR #9204 review feedback

Implemented performance improvements suggested by @greglandrum:

1. Move cheap degree check to start of isSpiroCenter()
   - Early bailout eliminates ~95% of candidates immediately

2. Replace std::set with boost::dynamic_bitset<>
   - Faster set operations for ring membership tests
   - More efficient intersection using bitwise AND

3. Remove expensive PRECONDITION in flipAboutSpiroCenter()
   - Caller already validates spiro center, no need to check again

All tests pass (testDepictor: 7.85s).

* Use boost::dynamic_bitset in removeCollisionsBondAndSpiroFlip

Replaced std::set<unsigned int> with boost::dynamic_bitset<> for
spiro center caching in collision resolution:

- Changed spiroCenters from std::set to boost::dynamic_bitset
- Updated tryResolvingCollisionWithSpiroFlip() signature
- Replaced set.find() with bitset.test() for membership checks
- Replaced set.insert() with bitset.set() for marking spiro centers

Benefits:
- Faster membership tests (O(1) bit test vs O(log n) tree lookup)
- Better cache locality (contiguous bit array vs scattered nodes)
- Simpler code (no iterator comparisons)

All tests pass (testDepictor: 2.64s).

* remove unnecessary reformatting

* more unneeded formatting

* even more unecessary formatting

---------

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@noreply.github.com>
Co-authored-by: Chris Von Bargen <christopher.vonbargen@schrodinger.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Brandon Novy <142041993+Brandon-Cole@users.noreply.github.com>
Co-authored-by: Ricardo Rodriguez <ricrogz@users.noreply.github.com>
Co-authored-by: Kevin Boyd <kboyd@nvidia.com>
Co-authored-by: Eloy Félix <eloyfelix@gmail.com>
Co-authored-by: Marco Ballarotto <marco.ballarotto@icr.ac.uk>
Co-authored-by: Emily Rhodes <70823163+emilyrrhodes@users.noreply.github.com>
Co-authored-by: Raul Sofia <67133355+RaulSofia@users.noreply.github.com>
Co-authored-by: Dan Nealschneider <dan.nealschneider@schrodinger.com>
2026-06-03 05:56:04 +02:00
Rody Arantes
5d9892575c Fix STEREOANY (wavy bond) loss during InChI roundtrip (#9315)
When converting molecules with wavy bonds (Bond::STEREOANY on double
bonds) through InChI and back, the stereo information was silently
dropped. This affected any workflow using InChI roundtrips for
canonicalization (e.g. with -SUU flag).

Two bugs in External/INCHI-API/inchi.cpp:

Reverse path (InchiToMol): The stereo0D processing loop skipped
INCHI_PARITY_UNDEFINED entries before they could reach the double bond
handler. The handler already had an else clause that correctly sets
Bond::STEREOANY, but it was never reached. Fix: only skip
INCHI_PARITY_NONE at the top level, and add a guard in the Tetrahedral
case to prevent UNDEFINED/UNKNOWN from incorrectly setting chirality.

Forward path (MolToInchi): STEREOANY double bonds were only handled by
collapsing the coordinates — InChI then produced no stereo annotation
under -SUU. Fix: also emit a stereo0D entry with INCHI_PARITY_UNKNOWN
parity so InChI's -SUU output correctly carries the "stereo unknown"
designation. StereoAtoms may be cleared for STEREOANY, so we locate
the two outer neighbors by iterating bonds.

New test testStereoAnyRoundtrip in External/INCHI-API/test.cpp covers
9 representative cases (Schiff base, oxime, cinnamic acid, chalcone,
crotonaldehyde, tamoxifen-like, retinal-like, plus two molecules with
a chiral center adjacent to the wavy bond).

Counts in rdkit/Chem/UnitTestInchi.py shift by 1 (689 same, 492
reasonable) because the new STEREOANY emission produces a more
accurate roundtrip for one entry in the test inventory.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-02 14:41:00 +02:00
David Cosgrove
b04a861ae7 Replace combineMols with RWMol::insertMol. (#9319)
Co-authored-by: David Cosgrove <david@cozchemix.co.uk>
2026-06-02 14:38:14 +02:00
Dan Nealschneider
226427e0bc Synthon substructure search 2x performance (#9307)
* synthon perf: replace O(N) haveEnoughHits scan with O(1) atomic counter

processPartHitsFromDetails called haveEnoughHits after each verified hit,
which scanned every slot of the pre-sized results vector (up to toTryChunkSize
= 2.5M entries) to count non-null entries via std::accumulate. With ~3000
verified hits per search that is ~7.5B pointer reads per query.

Replace with a std::atomic<int64_t> numHitsFound counter in makeHitsFromToTry,
incremented via fetch_add on each verified hit. The early-exit condition becomes
a single atomic read, O(1) per hit regardless of vector size. The atomic is
local to makeHitsFromToTry so it resets correctly per chunk and is safe for
the multi-threaded path without added synchronization.

Measured on synthon_perf branch (42-rxn / 140B-product Freedom space,
maxHits=3000, hitStart=1000, before boost::unordered_flat_set change):
  search-several (9 queries): ~30s → ~16.5s (~1.8x)
  search-one (benzene):       ~3.5s → ~1.8s  (~1.9x)

All 4 synthon ctest cases pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* style ++

* Update Code/GraphMol/SynthonSpaceSearch/SynthonSpaceSearcher.cpp

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
2026-06-02 14:23:49 +02:00
Steven Kearnes
bf711414a3 Build: forward-slash Python3_EXECUTABLE when generating rdkit-stubs (#9318)
The rdkit-stubs CMakeLists builds a cmake script (RUN_GEN_RDKIT_STUBS_PY)
that is later run via cmake -P. Every path embedded in that generated
script is converted to forward slashes first (CMAKE_SOURCE_DIR,
CMAKE_CURRENT_BINARY_DIR, PYTHON_INSTDIR) so the backslashes in native
Windows paths don't get treated as escape sequences when the script is
re-parsed.

Python3_EXECUTABLE was the one path that wasn't converted, so on Windows
the generated COMMAND line contained backslashes and failed to parse.
Apply the same forward-slash conversion to it for consistency.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-02 08:23:27 +02:00
Greg Landrum
b50343f7f7 Do deprecations for 2026.09 release (#9213)
* Do deprecations

* update release notes
2026-05-29 10:19:17 +02:00
Greg Landrum
28112aaef9 add ability to block atoms/bonds from participating in tautomer zones (#9297)
* add ability to block atoms/bonds from participating in tautomer zones

* be more structured with the atom flag

* response to review

---------

Co-authored-by: = <=>
2026-05-29 05:38:02 +02:00
Brian Kelley
b417465e93 Adds MolToCDXMLBlock to FileParsers (#9291)
* Adds MolToCDXMLBlock to FileParsers

* Simplified code, removed warning

* Fix C# wrapper for MolToCDX

* Add C# test, fix cscode in swig

* Fix typo in tests

* Set default format to CDXML for MolToCDXML

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

* Add CDXML writer smoke tests

---------

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
2026-05-29 05:36:35 +02:00
Dan Nealschneider
76a32ef1ee synthon perf: replace sort+unique dedup with boost::unordered_flat_set (#9305)
sortAndUniquifyToTry previously built a parallel vector of (index, string)
pairs, sorted by string, erased duplicates, then rebuilt the original vector
— O(N log N) with one heap allocation per candidate product.

Replace with an erase-remove over a boost::unordered_flat_set<size_t> keyed
on buildProductHash (boost::hash_combine over synthon IDs + reaction ID).
Dedup is now O(N) average with no string allocations on the hot path.

Also switch SearchResults::d_molNames from std::unordered_set<std::string>
to boost::unordered_flat_set<std::string> for the same open-addressing cache
locality benefit during mergeResults.

Perf (42-rxn / 140B-product Freedom space, maxHits=3000, hitStart=1000,
9 queries; vanilla.log → 2unordered_flat_set.log):
  Benzene:       6.92s → 5.64s  (−19%)
  Tolueneish:    6.19s → 5.07s  (−18%)
  Acetaminophen: 4.50s → 3.63s  (−19%)
  Allopurinol:   4.41s → 3.94s  (−11%)
  Theophylline:  4.39s → 3.90s  (−11%)
  Nicotine:      4.87s → 3.97s  (−18%)
  Ciprofloxacin: 6.82s → 6.09s  (−11%)
  Aspirin:       4.51s → 3.42s  (−24%)
  Metoprolol:    5.11s → 4.07s  (−20%)
  Total:        48.40s → 40.33s (−17%)

Hit counts and MaxNumResults unchanged across all queries.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-28 17:13:03 +02:00
Greg Landrum
7a25f047c1 swap to specific versions of other actions we use (#9308) 2026-05-28 16:47:22 +02:00
David Cosgrove
b30b5d586c Option to draw all bonds in symbolColour. (#9304) 2026-05-28 16:46:13 +02:00
Steven Kearnes
30ddbdd44e Build: tag rdkitpython install rules with COMPONENT python (#9288)
#9287 tagged these install rules with COMPONENT dev, which routes
rdkitpython-config.cmake / rdkitpython-targets.cmake into the dev
component alongside the python-agnostic C++ cmake config. That's
correct progress over the prior Unspecified default, but `dev` is the
wrong group: these files hardcode a single python version via
configure_file substitution (find_dependency(Python3 3.X ...)
and Boost component python3XX/numpy3XX), so they belong with the
python wrappers — themselves python-version-specific — rather than
with python-agnostic dev artifacts.

Change the two affected COMPONENT tags from dev to python.
2026-05-28 13:24:54 +02:00
Clay Moore
141ba3bd73 Fix BFGS gradient-convergence denominator for negative energies (#9298)
In Code/Numerics/Optimizer/BFGSOpt.h, the gradient-convergence check
computed

    double term = std::max(funcVal * gradScale, 1.0);
    ...
    test /= term;
    if (test < gradTol) return 0;

When funcVal (the current energy) is negative, funcVal * gradScale is
negative and std::max clamps the denominator to 1.0. The convergence
test therefore divides the gradient norm by 1 instead of by the
intended |E| * gradScale, which over-tightens the criterion by a factor
of |funcVal * gradScale| whenever |funcVal * gradScale| > 1.

Negative energies are a normal mid-minimization state for force fields
that include stabilizing terms (MMFF94, UFF with charges, AMBER-style
potentials), so this affects realistic workloads: extra BFGS iterations
or, occasionally, hitting MAXITS and returning the "too many iterations"
status when convergence would otherwise have been reached.

The fix is to use |funcVal| in the denominator, matching the pattern
used three lines below ('std::max(fabs(pos[i]), 1.0)') and matching the
intended interpretation as a magnitude.

A new test case 'testBFGSOptimizationNegativeEnergy' in
testOptimizer.cpp minimizes a 2D quadratic whose value is always
negative along the convergence path and verifies the optimizer reaches
the analytic minimum.

git blame attributes the original line to commit e08e0d16d (Nov 2015),
when the optimizer was restructured; the surrounding code does use
absolute values, so this reads as an oversight rather than an
intentional choice.
2026-05-28 13:14:59 +02:00
David Cosgrove
71e7775d35 Fix layout of reaction components in reaction drawing. (#9302)
* Fix layout of reaction components in reaction drawing.

* <sigh>

---------

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
2026-05-27 18:23:17 +02:00
David Cosgrove
ce08c344e8 Github9280 (#9300)
* Fix setFontScale for maximum and minimum font sizes.

* Fix other test.  Add hashcodes.
2026-05-27 06:39:28 +02:00
Greg Landrum
85f33083cd add checked atom and bond iterators (#9290)
* add checked iterators

* support checked atom and bond iterators

the idea here is to allow optional checking that the graph is not being
modified while an iterator is active

* ignore new member functions

---------

Co-authored-by: Ric R <ricrogz@gmail.com>
2026-05-26 15:25:34 +02:00
Greg Landrum
4ad9f33bf6 Revert "Fix WedgeMolBonds stealing wiggly bonds from adjacent chiral atoms (#…" (#9293)
This reverts commit 020b755ad4.
2026-05-22 04:39:08 +02:00
Chris Von Bargen
020b755ad4 Fix WedgeMolBonds stealing wiggly bonds from adjacent chiral atoms (#9267)
The standard re-wedging pattern is:

    clearSingleBondDirFlags(mol);  // saves _UnknownStereo=1, clears BondDir to NONE
    WedgeMolBonds(mol, &conf);     // re-derives wedges from chiral tags
    // ... caller restores BondDir::UNKNOWN on bonds with _UnknownStereo=1 ...

Wiggly bonds at chiral centers should survive this round-trip but did
not: countChiralNbrs only checked BondDir to decide whether a chiral
atom's stereo was already expressed, missing the _UnknownStereo=1
marker that clearSingleBondDirFlags saved when it cleared BondDir to
NONE. With the marker invisible to countChiralNbrs, pickBondToWedge
would pick the wiggly bond itself (terminal neighbors are scored
lower), and the caller's subsequent restore of BondDir::UNKNOWN would
erase the wedge, leaving the chiral atom with no visible stereo.

Fix: extend countChiralNbrs to recognize bonds with _UnknownStereo=1
as equivalent to BondDir::UNKNOWN. The chiral atom is then pre-skipped
in pickBondsToWedge, the same way it already is for bonds that still
have BondDir::UNKNOWN set.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 19:26:58 +02:00
Ricardo Rodriguez
92d5d2c657 Refactor to stop using iterator definitions in types.h (#9275)
* clean up iterator defs in types.h

* do not use auto for inline constexpr

* restore undef max,min

* restore types.h declarations
2026-05-21 19:19:38 +02:00
Steven Kearnes
169a1b2b67 Build: tag dev-only install rules with COMPONENT dev (#9287)
Headers under Code/RDGeneral/hash and the generated cmake package config
files (rdkit-config*.cmake, rdkit-targets*.cmake, rdkitpython-config*.cmake,
rdkitpython-targets*.cmake) are currently installed without an explicit
COMPONENT, so they default to "Unspecified" and cannot be cleanly separated
from runtime artifacts by packagers that use cmake's per-component install
(e.g. -DCMAKE_INSTALL_COMPONENT=dev).

In conda-forge we split the package into librdkit (runtime) and librdkit-dev
(headers + cmake config), with librdkit installing components
"Unspecified base data runtime" and librdkit-dev installing the "dev"
component. With the current upstream tagging, the hash headers and all
cmake config files end up in librdkit instead of librdkit-dev, which both
ships build-time artifacts in a runtime package and leaves librdkit-dev
without the cmake config needed for find_package(RDKit) to work.

This commit tags all five affected install() calls with COMPONENT dev so
per-component installs work correctly. Default (non-component) installs
are unaffected.
2026-05-21 19:16:05 +02:00
Greg Landrum
1dfc9b7a1b Fixes #9231: improve escaping values in CXSMILES (#9273)
* this is a fix, but it breaks other tests

* all tests pass.

This undoes the changes made as part of the "fix" for #5466

* update js tests

* response to review

---------

Co-authored-by: = <=>
2026-05-17 11:51:14 -04:00
Ricardo Rodriguez
434714e7d4 Fixes #9270 (#9272)
* fix handling double bond stereo extraction

* add tests

* Update Code/GraphMol/Subset.cpp

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

---------

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
2026-05-16 07:01:25 +02:00
Steven Kearnes
d1b8874851 PgSQL: preserve toolchain LDFLAGS on macOS (#9285)
The previous `set(CMAKE_EXE_LINKER_FLAGS ...)` replaced the variable
wholesale, which clobbers any toolchain-supplied linker flags. In
particular, conda-forge's clang_osx-64 / clangxx_osx-64 packages set
`-stdlib=libc++ -L${PREFIX}/lib -Wl,-rpath,${PREFIX}/lib` via
`CMAKE_EXE_LINKER_FLAGS`. Losing those flags causes the postgres
extension link to pick up the wrong libc++ and fail to resolve
ABI-tagged symbols on libc++ 19+:

    [ 94%] Linking CXX executable rdkit.dylib
    Undefined symbols for architecture x86_64:
      "VTT for std::__1::basic_stringstream<...>"
      "vtable for std::__1::basic_stringbuf<...>"
      "vtable for std::__1::basic_stringstream<...>"
      "vtable for std::__1::basic_istringstream<...>"
    ld: symbol(s) not found for architecture x86_64

The missing symbols carry the `[abi:ne190107]` ABI tag introduced by
libc++ 19+ — references that only resolve against the conda-forge
libc++, not the system one the link was falling back to.

Append to `CMAKE_EXE_LINKER_FLAGS` instead so the toolchain flags
survive. The other rdkit `.dylib`s in the same build are linked via
the standard cmake toolchain path and were never affected.

Verified by building rdkit-postgresql on osx-64 + osx-arm64 via the
conda-forge feedstock (https://github.com/conda-forge/rdkit-feedstock)
with this fix applied as a downstream patch.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 07:00:01 +02:00
Gareth Jones
8e9adcd467 Adds some features to the C# SWIG wrappers (#9274) 2026-05-14 09:07:56 +02:00
reza bagheri alashti
24f0007757 refactor: improve readability and maintainability of AAP similarity code (#9277)
* refactor: clean up AAP similarity logic and add type hints

* Refactor AAP similarity implementation for clarity and maintainability

* Refactor AAP similarity implementation for clarity and maintainability

* ran yapf over the code
2026-05-14 06:52:12 +02:00
github-actions[bot]
2f6bbe03b0 [bot] Update molecular templates header (#9269)
Co-authored-by: github-actions[bot] <github-actions[bot]@noreply.github.com>
2026-05-07 05:16:48 +02:00
reza bagheri alashti
bd73a574ad Docs: fix CosineSimilarity formula and clarify similarity metric names in BitOps.h (#9264)
* Fix CosineSimilarity doc formula and clarify similarity metric names in BitOps.h

* Fix CosineSimilarity formula in BitOps.h and adjust similarity docs
2026-05-06 17:12:53 +02:00
Dan Nealschneider
67fc0708e5 CIPLabeler performance: Store vector of bonds (#9250)
* CIPLabeler performance: Store vector of bonds

CIPLabelling refers to bonds by index over and over again. This
causes a measurable hit in performance in findConfigs() because
we iterate over a bitset of "allowed" bonds. For very large
molecules with many bonds, this can be a rate-limiting step!

This affects many PDB-sized structures.

2J3N goes from 0.7s to 0.25s with this change.

I had another example for which the findBondWithIdx() call was
taking 500ms of a 700ms call (after the performance update
in #9171 was implemented)

* yikes, XXL reserve

thanks, greg

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

---------

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
2026-05-06 11:57:28 +02:00
Dan Nealschneider
1663989053 CIP labeller performance: Don't calculate auxiliary descriptors unnecessarily (#9171)
* CIP labeller: Don't calculate auxiliary descriptors unnecessarily

The first 3 rules (the constitutional rules) are pretty easy
to understand. After rule 3, we need to calculate auxiliary
stereo descriptors to break ties.

However, we _were actually_ calculating auxiliary stereodescriptors
for all centers! We should only need to calculate auxiliary
stereocenters for sites that are needed to break ties.

This cost time - it also caused errors if the auxiliary descriptors
needed a graph expansion, because bonds in the digraph might be
pointed in the wrong direction.

Example case PDB ID 4AXM
Before this commit, errored with "Could not calculate parity! Carrier mismatch"
after 14s. After this commit, completes successfully in 0.036s.
Labelled centers all match (for the centers that had labels in
the failure case).

Includes a test that I can imagine breaking with this optimization.
The reference labels are from before this change

* Ensure all "arms" of stereo bonds and atropisomer bonds are expanded

For tetrahedral centers, ranking using the constitutional rules
always expands as far as is needed (but no further). For SP2bond
and atropisomers, if the first side is not resolvable, the
second side is never visited.

If the constitutional rules don't resolve a side, we need to
label the auxiliary centers. It's important to label all
auxiliary centers that _will_ be visited, so we need to know
what centers will be visited.

This commit updates the label() call in SP2 and Atropisomer
bonds to always attempt to label both sides if using the
constitutional rule set.

The constitutional rules are cheap, and if they fail, we
always go on to the full rule set. It is not a savings to skip
the search on the second side if we're going to keep going
anyway!

Includes a test that reproduces Ricardo's example.

This has no measurable effect on performance relative to the
original solution

* If any parts of the center have been seen, label it.

I couldn't make an example hit this, but Ric is totally
theoretically right

* Greg's ranges suggestion #2

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

* any_of for container search

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

---------

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
2026-05-06 06:12:50 +02:00
Raul Sofia
372fbad131 Extended fix for #9101 (#9255)
* fix extended boundary issue (3 mols)

* clang pass

* no change. retrigger CI for failed java test

there's a failing java test that seems to be failing by chance rather than by changes, as it depends on rng. this is just to retrigger the CI pipeline to confirm this

* no change. retrigger the CI (yet again)

* raw strings and removed garbage collector
2026-05-06 06:10:37 +02:00
Emily Rhodes
3836049ab2 Add optional default_val parameter to GetProp() (#9242)
* SHARED-12256: Add test and change function.

* SHARED-12256: Update to only wrapping changes.

* SHARED-12256: Parameterize tests.

* SHARED-12256: GetPropIfPresent changes.

* Revert "SHARED-12256: GetPropIfPresent changes."

This reverts commit f598f8c161.

* SHARED-12256: Make default the keyword in the boost wrappings.

* SHARED-12256: Overload function instead of using a sentinel.

* SHARED-12256: Extend GetProp changes.

* SHARED-12256: Add entry point for tests and fix tests.
2026-05-06 06:09:11 +02:00
Marco Ballarotto
b54cbac151 Pandastools improvements (#9251)
* Added automatic parsing functionality

* Added documentation

* Slightly changed check for gzip extension

* Apply suggestions from code review

Added small changes for readability

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

---------

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
2026-05-05 18:02:49 +02:00
Greg Landrum
6d75052459 Support using iterators with MolSuppliers (#9230)
* iterators for random-access MolSuppliers
add optional caching to SDMolSupplier

* add support to SmilesMolSupplier too
There is a lot of duplicate code between the random-access suppliers that would be worth trying to remove
but at the moment it looks like it would require multiple inheritance, and I think we want to avoid that

* add input iterators for ForwardSDMolSupplier()

* throw when calling begin() on a used supplier

* switch to use the spaceship operator

* init() should reset the mol cache

* Make SDMolSupplier and SmilesMolSupplier safe for multi-threaded reads

* add benchmarking

* add TDTMolSupplier support
improved testing
add benchmarks for parallel iteration
optional TBB support

* better const handling, add reverse iterators

doesn't look like const_iterator is possible since getting data from the underlyng supplier object is non-const

* improve docs
more usings
add reverse iterator to TDTMolSupplier

* tests only try execution::par when it is there

* fix typo

* more testing/demo

* remove accidentally added files

* review changes

* add default ctors

* disable a false-positive compiler warning
it is stupid to have to do this

---------

Co-authored-by: = <=>
2026-05-05 13:36:15 +02:00
Kevin Boyd
232e4ffc84 Add Getter functions to MMFF property python interface (#9254) 2026-04-30 17:06:31 +02:00
Greg Landrum
251353a217 Ensure that StereoGroups don't have duplicate atoms or bonds (#9258)
* check for duplicate atoms/bonds in StereoGroups

* explicit handling of duplicate stereogroup atoms in CTAB and CXSMILES parsers

---------

Co-authored-by: = <=>
2026-04-29 16:54:00 +02:00
Eloy Félix
cb251343b9 Tautomer insensitive hash v2, E/Z and stereocenter-preservation (#9128)
* Tautomer insensitive hash v2, E/Z and stereocenter-preservation

* Preserve E/Z stereochemistry and stereocenters in TautomerHashv2

Simplify extension logic to better protect stereocenters connected via
single bonds to aromatic systems. Preserve E/Z stereo on exocyclic
double bonds to distinguish geometric isomers (e.g., E/Z hydrazones).

* add helper function to remove duplicated code

* Fix ring info and bond aromaticity handling in MolHash

- Add fastFindRings check in TautomerHashv2 before ring queries
- Set isAromatic consistent with bond type (true for AROMATIC bonds)
- Fix inverted condition in RegioisomerHash

* more consistent hashes regardless of stereo annotation
2026-04-24 14:19:47 +02:00
Nic Zonta
6cac6afcb3 If templates match, skip ring number check (#9217)
* remove ring mathcing for templates

* remove extra code

* remove empty lines

* fix build error
2026-04-23 19:21:29 +02:00
Kevin Boyd
bbee5fedb0 Fix bug in inversion term for UFF, add finite difference checker. (#9228)
* Fix copyright

* Address review comments

Removed finite diff from RDKit headers

Used explicit coordinates
2026-04-23 06:21:42 +02:00
Greg Landrum
e35f7db009 Cleanup/get atoms and bonds (#9243) 2026-04-18 05:22:09 +02:00
Ricardo Rodriguez
db025bd6b0 make sorting more consistent (#9239) 2026-04-16 05:05:14 +02:00
Brandon Novy
efa7a32c3c MolDraw2D: configurable legend position and vertical side legends (Issue #9023) (#9183)
* Configurable legend position (Top/Left/Right/Bottom) and vertical text (GitHub #9023)

- Add LegendPosition enum and legendPosition, legendVerticalText to MolDrawOptions
- Support legend at Top, Left, Right, Bottom; vertical text for Left/Right
- Python: MolDrawOptions.legendPosition, .legendVerticalText; LegendPosition enum
- Python: MolToSVG() wrapper with legend/drawOptions; doc updates for MolToImage
- JSON: legendPosition (string), legendVerticalText (bool) in draw options
- C++ and Python tests; release note and Cartridge.md docs

* MolDraw2D: legend gutter for horizontal side legends; vertical side height fit

- Reserve horizontal gap between molecule and left/right horizontal legends
  (scale mol to molWidth-gutter, align toward legend strip).
- Position horizontal side legend by measured text width from partition edge.
- Vertical side legends: iterative scale so n*max_h+(n-1)*gap fits panel.
- Catch: long vertical side legend section.

* Update legend-position tests and review-driven cleanup

Use enum/default wording for legendPosition docs, move the lightweight Python test to Wrap, add regex-based placement checks (including horizontal side and vertical stacking), and refactor extractLegend helpers per style guidance.

* Fix MolDraw2D legend edge cases

* MolDraw2D: review follow-up (legend tests, bounds, DRY Top/Bottom)

* Update no-FT legend test coords

* Address PR review: document constants, remove release-note text, and simplify extra-padding logic
2026-04-16 04:59:00 +02:00
Chris Von Bargen
d8f4afb558 Add getSGroupDataLabels() to MolDraw2D_detail namespace (#9189)
Adds a new function MolDraw2D_detail::getSGroupDataLabels() that returns
the text and molecule-coordinate positions of DAT SGroup labels, using
the same placement logic as the drawing code. This allows external
renderers to display SGroup labels consistently with RDKit's placement.

Refactors DrawMol::extractSGroupData() to call getSGroupDataLabels()
internally, eliminating the duplicate FIELDDISP parsing and position
computation logic.

Closes #7829

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 04:56:00 +02:00
Greg Landrum
d4e8aa9fed mention AI tools in the contrib guidelines (#9224)
* mention AI tools in the contrib guidelines

* response to review

---------

Co-authored-by: = <=>
2026-04-16 04:47:15 +02:00
Greg Landrum
2c6efb4a65 Add some std::ranges support (#9218)
* initial ranges support for Atom/Bond iterators.
needs more testing

* support random access
test sort

more testing please

* compiles on windows

* fix size()
more testing
add some benchmarking

* disable benchmarking code by default

* do not allow modifying the graph through the iterators

---------

Co-authored-by: = <=>
2026-04-13 17:13:04 +02:00
github-actions[bot]
f150381c13 [bot] Update molecular templates header (#9234)
Co-authored-by: github-actions[bot] <github-actions[bot]@noreply.github.com>
2026-04-10 15:02:46 +02:00
David Cosgrove
eaf546b037 Misplaced parentheses in shape code (#9222)
* Move stray bracket.

* Lots of consts.

* Another bad bracket.

* Response to review.
2026-04-07 16:59:56 +02:00
Yakov Pechersky
87a6c7163d Use index-order kekulization in MolToInchi (#9226)
MolToInchi has called MolOps::Kekulize(*m, false) for years, but PR #9125 changed the default traversal to canonical=true. That pulls rankFragmentAtoms() and the canonicalization path into the InChI conversion even though the tested InChI outputs stay the same.

Validation:
- rdkit.Chem.UnitTestInchi passed before and after this change on upstream/master (18 tests, OK in both runs).
- No InChI output drift was observed between stock and patched builds on Regress/Data/mols.1000.sdf.gz, rdkit/Chem/test_data/pubchem-hard-set.sdf.gz, or the atom-order regression molecules added in Code/GraphMol/catch_graphmol.cpp.

Performance:
- Release_2026_03_1 Python MolToInchi on Regress/Data/mols.1000.sdf.gz improved from 0.40712s to 0.38871s median (-4.52%).
- Release_2026_03_1 rdinchi MolToInchi on the same dataset improved from 0.39755s to 0.37814s median (-4.88%).
- Release_2026_03_1 standalone C++ MolToInchi on /tmp/mols.1000.sdf improved from 7.66775s to 7.03474s wall time (-8.26%), from 20.57B to 19.04B cycles (-7.46%), and from 121.78M to 114.05M cache misses (-6.35%).
2026-04-07 06:17:40 +02:00
Yakov Pechersky
6c4411b1d1 Add more pyi patches, 2026-03 (#9214) 2026-04-06 11:51:49 +02:00
Ricardo Rodriguez
9e301c15d6 Normalize rings (#9208)
* normalize rings

* update tests

* update doctests

* update release notes
2026-04-01 05:37:02 +02:00