* First import of GaussianShape.
* Tidying.
* Custom features.
* Optimise.
* Optimise.
* Return 3 scores rather than 2 including combo score.
* Rename useFeatures to useColors.
* Python wrappers.
* Python tests.
* Take out big test.
* Add new start mode, as PubChem does it.
* Doh!
* Fix MolTransforms eigenvalue return.
* Two cycle optimisation, mostly working.
* Take out bestSoFar score from SCA.
* Take out DTYPE.
* Tidy out redundant variables.
* Optimisation in 2 parts.
* More fiddling in pursuit of speed.
* Update Python wrapper.
* Tweak.
* Atom subsets and different radii.
* Fix test.
* Revert pubchem_shape's test.cpp.
* Serialize ShapeInput.
* Trigger build
* Remove pointers to std::arrays in ShapeInput.
* ShapeInput virtual d'tor.
* Precondition - ShapeInput needs a molecule with at least 1 conformer.
* Rename ShapeInput::d_centroid to ShapeInput::d_canonTrans.
* Fix normalization bugs.
* Select start mode using moments of inertia rather than eigenvalues of canonical transformation.
* Include color features in moments of inertia.
* Smidge faster.
* Tversky similarity.
* Tidy tests.
* Tests working on Linux.
* Revert force of right handed axes in MolTransforms::computePrincipalAxesAndMomentsFromGyrationMatrix replacing with a comment in the code.
* Response to review.
* Sneaky allCarbon bug.
* add multithreaded test
* Response to review.
* Doh! Don't recalculate normalization after every transformation.
* Re-instate d_normalizationOK.
* Re-name functions for fetching canonical transformations.
* Separate alpha from coords.
* MultiConf works with single conf extraction.
* Extract all conformations.
Max and best similarities.
* Renames d_currConformer to d_activeShape.
* Update shapeToMol.
* Update shapeToMol.
* Changes from synthon shape searching.
* Fix normalization of multiple confs.
* Update Python wrappers.
* Fix shape merge.
* Improve bestSimilarity.
* Fix python wrapper.
* Pull in changes from SynthonShapeSearch:
make pruneShapes public.
function to negate Alpha values.
* clang-tidy suggestions.
* clang-tidy suggestions.
* Bug in quaternion gradients - we now have only 3 coordinates.
* Tidy tests.
* Mac result slightly different.
* Multi conformer molecule alignment.
* Optionally return raw overlap volumes in score functions.
* Python wrappers for raw overlap volumes.
* Update Python wrapper ShapeInputOptions.
* Tidy for PR.
* Extra include file.
* Extra library
* Tidy forward declarations.
* Don't prune if threshold < 0.0.
* Windows exporty thing.
* Check SMILES on merge of ShapeInputs.
* PRECONDITION of SMILES on merge of ShapeInputs.
* Response to review - rename some functions.
* change how overlapVols is passed
add a test for it
* API suggestions
* Response to review.
* Remove debugging writes.
* Fix Python wrappers.
---------
Co-authored-by: David Cosgrove <david@cozchemix.co.uk>
Co-authored-by: greg landrum <greg.landrum@gmail.com>
* add flipping of spiro rings as a way to solve clashes
* remove extra function
* add test file
* update coordgen parameters to allow for bond flipping
* fix failing tests
* Update Code/GraphMol/Depictor/EmbeddedFrag.h
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* Update Code/GraphMol/Depictor/EmbeddedFrag.cpp
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* Update Code/GraphMol/Depictor/EmbeddedFrag.cpp
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* Update Code/GraphMol/Depictor/EmbeddedFrag.cpp
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* [bot] Update molecular templates header (#9234)
Co-authored-by: github-actions[bot] <github-actions[bot]@noreply.github.com>
* Add some std::ranges support (#9218)
* initial ranges support for Atom/Bond iterators.
needs more testing
* support random access
test sort
more testing please
* compiles on windows
* fix size()
more testing
add some benchmarking
* disable benchmarking code by default
* do not allow modifying the graph through the iterators
---------
Co-authored-by: = <=>
* mention AI tools in the contrib guidelines (#9224)
* mention AI tools in the contrib guidelines
* response to review
---------
Co-authored-by: = <=>
* Add getSGroupDataLabels() to MolDraw2D_detail namespace (#9189)
Adds a new function MolDraw2D_detail::getSGroupDataLabels() that returns
the text and molecule-coordinate positions of DAT SGroup labels, using
the same placement logic as the drawing code. This allows external
renderers to display SGroup labels consistently with RDKit's placement.
Refactors DrawMol::extractSGroupData() to call getSGroupDataLabels()
internally, eliminating the duplicate FIELDDISP parsing and position
computation logic.
Closes#7829
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* MolDraw2D: configurable legend position and vertical side legends (Issue #9023) (#9183)
* Configurable legend position (Top/Left/Right/Bottom) and vertical text (GitHub #9023)
- Add LegendPosition enum and legendPosition, legendVerticalText to MolDrawOptions
- Support legend at Top, Left, Right, Bottom; vertical text for Left/Right
- Python: MolDrawOptions.legendPosition, .legendVerticalText; LegendPosition enum
- Python: MolToSVG() wrapper with legend/drawOptions; doc updates for MolToImage
- JSON: legendPosition (string), legendVerticalText (bool) in draw options
- C++ and Python tests; release note and Cartridge.md docs
* MolDraw2D: legend gutter for horizontal side legends; vertical side height fit
- Reserve horizontal gap between molecule and left/right horizontal legends
(scale mol to molWidth-gutter, align toward legend strip).
- Position horizontal side legend by measured text width from partition edge.
- Vertical side legends: iterative scale so n*max_h+(n-1)*gap fits panel.
- Catch: long vertical side legend section.
* Update legend-position tests and review-driven cleanup
Use enum/default wording for legendPosition docs, move the lightweight Python test to Wrap, add regex-based placement checks (including horizontal side and vertical stacking), and refactor extractLegend helpers per style guidance.
* Fix MolDraw2D legend edge cases
* MolDraw2D: review follow-up (legend tests, bounds, DRY Top/Bottom)
* Update no-FT legend test coords
* Address PR review: document constants, remove release-note text, and simplify extra-padding logic
* make sorting more consistent (#9239)
* Cleanup/get atoms and bonds (#9243)
* Fix bug in inversion term for UFF, add finite difference checker. (#9228)
* Fix copyright
* Address review comments
Removed finite diff from RDKit headers
Used explicit coordinates
* If templates match, skip ring number check (#9217)
* remove ring mathcing for templates
* remove extra code
* remove empty lines
* fix build error
* Tautomer insensitive hash v2, E/Z and stereocenter-preservation (#9128)
* Tautomer insensitive hash v2, E/Z and stereocenter-preservation
* Preserve E/Z stereochemistry and stereocenters in TautomerHashv2
Simplify extension logic to better protect stereocenters connected via
single bonds to aromatic systems. Preserve E/Z stereo on exocyclic
double bonds to distinguish geometric isomers (e.g., E/Z hydrazones).
* add helper function to remove duplicated code
* Fix ring info and bond aromaticity handling in MolHash
- Add fastFindRings check in TautomerHashv2 before ring queries
- Set isAromatic consistent with bond type (true for AROMATIC bonds)
- Fix inverted condition in RegioisomerHash
* more consistent hashes regardless of stereo annotation
* Ensure that StereoGroups don't have duplicate atoms or bonds (#9258)
* check for duplicate atoms/bonds in StereoGroups
* explicit handling of duplicate stereogroup atoms in CTAB and CXSMILES parsers
---------
Co-authored-by: = <=>
* Add Getter functions to MMFF property python interface (#9254)
* Support using iterators with MolSuppliers (#9230)
* iterators for random-access MolSuppliers
add optional caching to SDMolSupplier
* add support to SmilesMolSupplier too
There is a lot of duplicate code between the random-access suppliers that would be worth trying to remove
but at the moment it looks like it would require multiple inheritance, and I think we want to avoid that
* add input iterators for ForwardSDMolSupplier()
* throw when calling begin() on a used supplier
* switch to use the spaceship operator
* init() should reset the mol cache
* Make SDMolSupplier and SmilesMolSupplier safe for multi-threaded reads
* add benchmarking
* add TDTMolSupplier support
improved testing
add benchmarks for parallel iteration
optional TBB support
* better const handling, add reverse iterators
doesn't look like const_iterator is possible since getting data from the underlyng supplier object is non-const
* improve docs
more usings
add reverse iterator to TDTMolSupplier
* tests only try execution::par when it is there
* fix typo
* more testing/demo
* remove accidentally added files
* review changes
* add default ctors
* disable a false-positive compiler warning
it is stupid to have to do this
---------
Co-authored-by: = <=>
* Pandastools improvements (#9251)
* Added automatic parsing functionality
* Added documentation
* Slightly changed check for gzip extension
* Apply suggestions from code review
Added small changes for readability
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
---------
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* Add optional default_val parameter to GetProp() (#9242)
* SHARED-12256: Add test and change function.
* SHARED-12256: Update to only wrapping changes.
* SHARED-12256: Parameterize tests.
* SHARED-12256: GetPropIfPresent changes.
* Revert "SHARED-12256: GetPropIfPresent changes."
This reverts commit f598f8c161.
* SHARED-12256: Make default the keyword in the boost wrappings.
* SHARED-12256: Overload function instead of using a sentinel.
* SHARED-12256: Extend GetProp changes.
* SHARED-12256: Add entry point for tests and fix tests.
* Extended fix for #9101 (#9255)
* fix extended boundary issue (3 mols)
* clang pass
* no change. retrigger CI for failed java test
there's a failing java test that seems to be failing by chance rather than by changes, as it depends on rng. this is just to retrigger the CI pipeline to confirm this
* no change. retrigger the CI (yet again)
* raw strings and removed garbage collector
* CIP labeller performance: Don't calculate auxiliary descriptors unnecessarily (#9171)
* CIP labeller: Don't calculate auxiliary descriptors unnecessarily
The first 3 rules (the constitutional rules) are pretty easy
to understand. After rule 3, we need to calculate auxiliary
stereo descriptors to break ties.
However, we _were actually_ calculating auxiliary stereodescriptors
for all centers! We should only need to calculate auxiliary
stereocenters for sites that are needed to break ties.
This cost time - it also caused errors if the auxiliary descriptors
needed a graph expansion, because bonds in the digraph might be
pointed in the wrong direction.
Example case PDB ID 4AXM
Before this commit, errored with "Could not calculate parity! Carrier mismatch"
after 14s. After this commit, completes successfully in 0.036s.
Labelled centers all match (for the centers that had labels in
the failure case).
Includes a test that I can imagine breaking with this optimization.
The reference labels are from before this change
* Ensure all "arms" of stereo bonds and atropisomer bonds are expanded
For tetrahedral centers, ranking using the constitutional rules
always expands as far as is needed (but no further). For SP2bond
and atropisomers, if the first side is not resolvable, the
second side is never visited.
If the constitutional rules don't resolve a side, we need to
label the auxiliary centers. It's important to label all
auxiliary centers that _will_ be visited, so we need to know
what centers will be visited.
This commit updates the label() call in SP2 and Atropisomer
bonds to always attempt to label both sides if using the
constitutional rule set.
The constitutional rules are cheap, and if they fail, we
always go on to the full rule set. It is not a savings to skip
the search on the second side if we're going to keep going
anyway!
Includes a test that reproduces Ricardo's example.
This has no measurable effect on performance relative to the
original solution
* If any parts of the center have been seen, label it.
I couldn't make an example hit this, but Ric is totally
theoretically right
* Greg's ranges suggestion #2
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* any_of for container search
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
---------
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* CIPLabeler performance: Store vector of bonds (#9250)
* CIPLabeler performance: Store vector of bonds
CIPLabelling refers to bonds by index over and over again. This
causes a measurable hit in performance in findConfigs() because
we iterate over a bitset of "allowed" bonds. For very large
molecules with many bonds, this can be a rate-limiting step!
This affects many PDB-sized structures.
2J3N goes from 0.7s to 0.25s with this change.
I had another example for which the findBondWithIdx() call was
taking 500ms of a 700ms call (after the performance update
in #9171 was implemented)
* yikes, XXL reserve
thanks, greg
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
---------
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* Address PR #9204 review feedback
Implemented performance improvements suggested by @greglandrum:
1. Move cheap degree check to start of isSpiroCenter()
- Early bailout eliminates ~95% of candidates immediately
2. Replace std::set with boost::dynamic_bitset<>
- Faster set operations for ring membership tests
- More efficient intersection using bitwise AND
3. Remove expensive PRECONDITION in flipAboutSpiroCenter()
- Caller already validates spiro center, no need to check again
All tests pass (testDepictor: 7.85s).
* Use boost::dynamic_bitset in removeCollisionsBondAndSpiroFlip
Replaced std::set<unsigned int> with boost::dynamic_bitset<> for
spiro center caching in collision resolution:
- Changed spiroCenters from std::set to boost::dynamic_bitset
- Updated tryResolvingCollisionWithSpiroFlip() signature
- Replaced set.find() with bitset.test() for membership checks
- Replaced set.insert() with bitset.set() for marking spiro centers
Benefits:
- Faster membership tests (O(1) bit test vs O(log n) tree lookup)
- Better cache locality (contiguous bit array vs scattered nodes)
- Simpler code (no iterator comparisons)
All tests pass (testDepictor: 2.64s).
* remove unnecessary reformatting
* more unneeded formatting
* even more unecessary formatting
---------
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@noreply.github.com>
Co-authored-by: Chris Von Bargen <christopher.vonbargen@schrodinger.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Brandon Novy <142041993+Brandon-Cole@users.noreply.github.com>
Co-authored-by: Ricardo Rodriguez <ricrogz@users.noreply.github.com>
Co-authored-by: Kevin Boyd <kboyd@nvidia.com>
Co-authored-by: Eloy Félix <eloyfelix@gmail.com>
Co-authored-by: Marco Ballarotto <marco.ballarotto@icr.ac.uk>
Co-authored-by: Emily Rhodes <70823163+emilyrrhodes@users.noreply.github.com>
Co-authored-by: Raul Sofia <67133355+RaulSofia@users.noreply.github.com>
Co-authored-by: Dan Nealschneider <dan.nealschneider@schrodinger.com>
* synthon perf: replace O(N) haveEnoughHits scan with O(1) atomic counter
processPartHitsFromDetails called haveEnoughHits after each verified hit,
which scanned every slot of the pre-sized results vector (up to toTryChunkSize
= 2.5M entries) to count non-null entries via std::accumulate. With ~3000
verified hits per search that is ~7.5B pointer reads per query.
Replace with a std::atomic<int64_t> numHitsFound counter in makeHitsFromToTry,
incremented via fetch_add on each verified hit. The early-exit condition becomes
a single atomic read, O(1) per hit regardless of vector size. The atomic is
local to makeHitsFromToTry so it resets correctly per chunk and is safe for
the multi-threaded path without added synchronization.
Measured on synthon_perf branch (42-rxn / 140B-product Freedom space,
maxHits=3000, hitStart=1000, before boost::unordered_flat_set change):
search-several (9 queries): ~30s → ~16.5s (~1.8x)
search-one (benzene): ~3.5s → ~1.8s (~1.9x)
All 4 synthon ctest cases pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* style ++
* Update Code/GraphMol/SynthonSpaceSearch/SynthonSpaceSearcher.cpp
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* add ability to block atoms/bonds from participating in tautomer zones
* be more structured with the atom flag
* response to review
---------
Co-authored-by: = <=>
sortAndUniquifyToTry previously built a parallel vector of (index, string)
pairs, sorted by string, erased duplicates, then rebuilt the original vector
— O(N log N) with one heap allocation per candidate product.
Replace with an erase-remove over a boost::unordered_flat_set<size_t> keyed
on buildProductHash (boost::hash_combine over synthon IDs + reaction ID).
Dedup is now O(N) average with no string allocations on the hot path.
Also switch SearchResults::d_molNames from std::unordered_set<std::string>
to boost::unordered_flat_set<std::string> for the same open-addressing cache
locality benefit during mergeResults.
Perf (42-rxn / 140B-product Freedom space, maxHits=3000, hitStart=1000,
9 queries; vanilla.log → 2unordered_flat_set.log):
Benzene: 6.92s → 5.64s (−19%)
Tolueneish: 6.19s → 5.07s (−18%)
Acetaminophen: 4.50s → 3.63s (−19%)
Allopurinol: 4.41s → 3.94s (−11%)
Theophylline: 4.39s → 3.90s (−11%)
Nicotine: 4.87s → 3.97s (−18%)
Ciprofloxacin: 6.82s → 6.09s (−11%)
Aspirin: 4.51s → 3.42s (−24%)
Metoprolol: 5.11s → 4.07s (−20%)
Total: 48.40s → 40.33s (−17%)
Hit counts and MaxNumResults unchanged across all queries.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* add checked iterators
* support checked atom and bond iterators
the idea here is to allow optional checking that the graph is not being
modified while an iterator is active
* ignore new member functions
---------
Co-authored-by: Ric R <ricrogz@gmail.com>
The standard re-wedging pattern is:
clearSingleBondDirFlags(mol); // saves _UnknownStereo=1, clears BondDir to NONE
WedgeMolBonds(mol, &conf); // re-derives wedges from chiral tags
// ... caller restores BondDir::UNKNOWN on bonds with _UnknownStereo=1 ...
Wiggly bonds at chiral centers should survive this round-trip but did
not: countChiralNbrs only checked BondDir to decide whether a chiral
atom's stereo was already expressed, missing the _UnknownStereo=1
marker that clearSingleBondDirFlags saved when it cleared BondDir to
NONE. With the marker invisible to countChiralNbrs, pickBondToWedge
would pick the wiggly bond itself (terminal neighbors are scored
lower), and the caller's subsequent restore of BondDir::UNKNOWN would
erase the wedge, leaving the chiral atom with no visible stereo.
Fix: extend countChiralNbrs to recognize bonds with _UnknownStereo=1
as equivalent to BondDir::UNKNOWN. The chiral atom is then pre-skipped
in pickBondsToWedge, the same way it already is for bonds that still
have BondDir::UNKNOWN set.
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* this is a fix, but it breaks other tests
* all tests pass.
This undoes the changes made as part of the "fix" for #5466
* update js tests
* response to review
---------
Co-authored-by: = <=>
* CIPLabeler performance: Store vector of bonds
CIPLabelling refers to bonds by index over and over again. This
causes a measurable hit in performance in findConfigs() because
we iterate over a bitset of "allowed" bonds. For very large
molecules with many bonds, this can be a rate-limiting step!
This affects many PDB-sized structures.
2J3N goes from 0.7s to 0.25s with this change.
I had another example for which the findBondWithIdx() call was
taking 500ms of a 700ms call (after the performance update
in #9171 was implemented)
* yikes, XXL reserve
thanks, greg
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
---------
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* CIP labeller: Don't calculate auxiliary descriptors unnecessarily
The first 3 rules (the constitutional rules) are pretty easy
to understand. After rule 3, we need to calculate auxiliary
stereo descriptors to break ties.
However, we _were actually_ calculating auxiliary stereodescriptors
for all centers! We should only need to calculate auxiliary
stereocenters for sites that are needed to break ties.
This cost time - it also caused errors if the auxiliary descriptors
needed a graph expansion, because bonds in the digraph might be
pointed in the wrong direction.
Example case PDB ID 4AXM
Before this commit, errored with "Could not calculate parity! Carrier mismatch"
after 14s. After this commit, completes successfully in 0.036s.
Labelled centers all match (for the centers that had labels in
the failure case).
Includes a test that I can imagine breaking with this optimization.
The reference labels are from before this change
* Ensure all "arms" of stereo bonds and atropisomer bonds are expanded
For tetrahedral centers, ranking using the constitutional rules
always expands as far as is needed (but no further). For SP2bond
and atropisomers, if the first side is not resolvable, the
second side is never visited.
If the constitutional rules don't resolve a side, we need to
label the auxiliary centers. It's important to label all
auxiliary centers that _will_ be visited, so we need to know
what centers will be visited.
This commit updates the label() call in SP2 and Atropisomer
bonds to always attempt to label both sides if using the
constitutional rule set.
The constitutional rules are cheap, and if they fail, we
always go on to the full rule set. It is not a savings to skip
the search on the second side if we're going to keep going
anyway!
Includes a test that reproduces Ricardo's example.
This has no measurable effect on performance relative to the
original solution
* If any parts of the center have been seen, label it.
I couldn't make an example hit this, but Ric is totally
theoretically right
* Greg's ranges suggestion #2
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* any_of for container search
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
---------
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* fix extended boundary issue (3 mols)
* clang pass
* no change. retrigger CI for failed java test
there's a failing java test that seems to be failing by chance rather than by changes, as it depends on rng. this is just to retrigger the CI pipeline to confirm this
* no change. retrigger the CI (yet again)
* raw strings and removed garbage collector
* SHARED-12256: Add test and change function.
* SHARED-12256: Update to only wrapping changes.
* SHARED-12256: Parameterize tests.
* SHARED-12256: GetPropIfPresent changes.
* Revert "SHARED-12256: GetPropIfPresent changes."
This reverts commit f598f8c161.
* SHARED-12256: Make default the keyword in the boost wrappings.
* SHARED-12256: Overload function instead of using a sentinel.
* SHARED-12256: Extend GetProp changes.
* SHARED-12256: Add entry point for tests and fix tests.
* iterators for random-access MolSuppliers
add optional caching to SDMolSupplier
* add support to SmilesMolSupplier too
There is a lot of duplicate code between the random-access suppliers that would be worth trying to remove
but at the moment it looks like it would require multiple inheritance, and I think we want to avoid that
* add input iterators for ForwardSDMolSupplier()
* throw when calling begin() on a used supplier
* switch to use the spaceship operator
* init() should reset the mol cache
* Make SDMolSupplier and SmilesMolSupplier safe for multi-threaded reads
* add benchmarking
* add TDTMolSupplier support
improved testing
add benchmarks for parallel iteration
optional TBB support
* better const handling, add reverse iterators
doesn't look like const_iterator is possible since getting data from the underlyng supplier object is non-const
* improve docs
more usings
add reverse iterator to TDTMolSupplier
* tests only try execution::par when it is there
* fix typo
* more testing/demo
* remove accidentally added files
* review changes
* add default ctors
* disable a false-positive compiler warning
it is stupid to have to do this
---------
Co-authored-by: = <=>
* check for duplicate atoms/bonds in StereoGroups
* explicit handling of duplicate stereogroup atoms in CTAB and CXSMILES parsers
---------
Co-authored-by: = <=>
* Tautomer insensitive hash v2, E/Z and stereocenter-preservation
* Preserve E/Z stereochemistry and stereocenters in TautomerHashv2
Simplify extension logic to better protect stereocenters connected via
single bonds to aromatic systems. Preserve E/Z stereo on exocyclic
double bonds to distinguish geometric isomers (e.g., E/Z hydrazones).
* add helper function to remove duplicated code
* Fix ring info and bond aromaticity handling in MolHash
- Add fastFindRings check in TautomerHashv2 before ring queries
- Set isAromatic consistent with bond type (true for AROMATIC bonds)
- Fix inverted condition in RegioisomerHash
* more consistent hashes regardless of stereo annotation
* Configurable legend position (Top/Left/Right/Bottom) and vertical text (GitHub #9023)
- Add LegendPosition enum and legendPosition, legendVerticalText to MolDrawOptions
- Support legend at Top, Left, Right, Bottom; vertical text for Left/Right
- Python: MolDrawOptions.legendPosition, .legendVerticalText; LegendPosition enum
- Python: MolToSVG() wrapper with legend/drawOptions; doc updates for MolToImage
- JSON: legendPosition (string), legendVerticalText (bool) in draw options
- C++ and Python tests; release note and Cartridge.md docs
* MolDraw2D: legend gutter for horizontal side legends; vertical side height fit
- Reserve horizontal gap between molecule and left/right horizontal legends
(scale mol to molWidth-gutter, align toward legend strip).
- Position horizontal side legend by measured text width from partition edge.
- Vertical side legends: iterative scale so n*max_h+(n-1)*gap fits panel.
- Catch: long vertical side legend section.
* Update legend-position tests and review-driven cleanup
Use enum/default wording for legendPosition docs, move the lightweight Python test to Wrap, add regex-based placement checks (including horizontal side and vertical stacking), and refactor extractLegend helpers per style guidance.
* Fix MolDraw2D legend edge cases
* MolDraw2D: review follow-up (legend tests, bounds, DRY Top/Bottom)
* Update no-FT legend test coords
* Address PR review: document constants, remove release-note text, and simplify extra-padding logic
Adds a new function MolDraw2D_detail::getSGroupDataLabels() that returns
the text and molecule-coordinate positions of DAT SGroup labels, using
the same placement logic as the drawing code. This allows external
renderers to display SGroup labels consistently with RDKit's placement.
Refactors DrawMol::extractSGroupData() to call getSGroupDataLabels()
internally, eliminating the duplicate FIELDDISP parsing and position
computation logic.
Closes#7829
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* initial ranges support for Atom/Bond iterators.
needs more testing
* support random access
test sort
more testing please
* compiles on windows
* fix size()
more testing
add some benchmarking
* disable benchmarking code by default
* do not allow modifying the graph through the iterators
---------
Co-authored-by: = <=>
* Speed up tautomer canonicalization by deferring on SSSR calc
* Lazy kekulization for tautomer enumeration
Defer kekulization of tautomers until they are actually needed for
transform matching. This avoids creating kekulized copies for:
1. The initial tautomer (until first iteration)
2. New tautomers that may never be processed (if enumeration ends early)
The Tautomer class now supports lazy initialization of the kekulized
form via getKekulized() method.
Performance improvement: ~7% additional speedup (total ~22-24% from baseline)
* Use count-only substructure matching in tautomer scoring
* Add SubstructMatchCount regression test
* MolStandardize: reduce enumerate overhead
* MolStandardize: avoid per-tautomer ring recomputation
* Atom: cache PeriodicTable pointer in valence calcs
* Atom: reuse PeriodicTable in getEffectiveAtomicNum
* PeriodicTable: add atomic fast path for getTable
* GraphMol: reduce ROMol copy reallocations
* MolStandardize: use quickCopy for per-match product copies
Use RWMol(*kmol, true) in tautomer enumeration to avoid copying properties/bookmarks/conformers for each candidate. This reduces deep-copy overhead without changing chemistry.
* MolStandardize: pre-filter scoring patterns by element/connectivity
For tautomer scoring, pre-compute which SubstructTerms are relevant for
a given input molecule. Since tautomerization only moves H atoms and
changes bond orders (never creates/destroys heavy-atom bonds), patterns
requiring missing elements or connectivity can be skipped for all
tautomers of that molecule.
Two-stage filtering:
1. Element check: skip patterns requiring atoms not in the molecule
2. Connectivity check: skip patterns whose bond-order-agnostic structure
doesn't match the input molecule's connectivity
This reduces the number of VF2 substructure calls per tautomer from 12
to typically 3-5, depending on the molecule's composition.
* MolStandardize: preserve molecule properties for canonical tautomer
Copy molecule properties from the original input to the canonical tautomer
result. Since quickCopy during enumeration skips d_props to avoid overhead,
extended SMILES data like link nodes (LN) was lost. This restores them
on the final result.
* TautomerQuery: preserve molecule properties (e.g. link nodes) in tautomers
TautomerQuery::fromMol() uses TautomerEnumerator::enumerate() which uses
quickCopy for performance. This doesn't copy molecule properties like
_molLinkNodes. Without this fix, XQMol output would lose link node
extensions in the SMILES.
Copy properties from the original query molecule to all enumerated
tautomers before constructing the TautomerQuery. This preserves extended
SMILES data without impacting enumeration performance.
* MolStandardize: use parallel iteration and cache bond lookups
Replace O(n) getAtomWithIdx/getBondWithIdx calls with parallel iteration
over atom/bond ranges in canonicalizeInPlace and enumerate. Cache bond
lookups in setTautomerStereoAndIsoHs to avoid repeated O(n) searches.
* perf: add specialized matchers for simple tautomer scoring patterns
Replace VF2 graph matching with O(n) loops for 6 simple patterns:
- countDoubleOrAromaticBonds: C=O, N=O, P=O patterns
- countMethyls: [CX4H3] methyl groups
- countCarbonDoubleHetero: [C]=[/home/dcvuser/rdkit;Code/GraphMol/MolStandardize/Tautomer.h] aliphatic C=hetero
- countAromaticCarbonExocyclicN: [c]=aromatic C=exocyclic N
Complex patterns (benzoquinone, oxim, guanidine, aci-nitro) still use VF2.
Combined with the pre-filtering optimization, this achieves ~3.7x speedup
(~2500ms vs ~9300ms original) for tautomer canonicalization.
* Fix tautomer canonicalize dropping conformers from quickCopy
quickCopy (RWMol(*mol, true)) skips conformers, so tautomer
enumeration products lose 2D/3D coordinates. This causes InChI
generation to omit the /b (double bond E/Z stereo) layer, since
E/Z is derived from atomic coordinates.
Fix: copy conformers from the original molecule onto the canonical
tautomer after pickCanonical in TautomerEnumerator::canonicalize().
Tests: SMILES-based E/Z check in testTautomer.cpp, molblock-based
conformer preservation check in catch_tests.cpp.
* add test on canonicalize losing stereo
* add regression test for exocyclic C=C tautomer canonicalization
The getTautomerStateKey() pre-filter (commit 2595ef748) can falsely
deduplicate distinct tautomers when their atom-index-ordered state
patterns happen to match, leading canonicalize() to pick the wrong
canonical form for molecules with STEREOTRANS-pinned exocyclic C=C
bonds after RemoveHs.
Test verifies that O=C(CC1=CC2=CC=COC2)NC1=O canonicalizes to the
exocyclic form O=C1CC(=CC2=CC=COC2)C(=O)N1, not the endocyclic form
O=C1C=C(C=C2CC=COC2)C(=O)N1.
Currently expected to FAIL until the state key dedup bug is fixed.
* MolStandardize: expand tautomer connectivity SMARTS
* MolStandardize: scope tautomer pattern enum
* MolStandardize: trim tautomer pattern enum
* MolStandardize: use symmetric ring scoring
* parse templates as smarts
* accept ring templates in SMARTS format
* undo CLAUDE mistake
* rename files
* enable templating for macrocycles
* enable macrocycle templating
* Add test for macrocycle templating
Tests that ring system templates are used only for macrocycles (rings
with size > 8). The test verifies the exact threshold by generating
coordinates with and without templates for rings of size 4-14.
Addresses review feedback on PR #9203.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* Fixes#9143
There is still some weirdness in the matching of ET bonds in macrocycles, but it is not connected to this change.
* adjust test to work on win (and mac?)
* tweak expected results for win ci
* Remove Dict::getData() for a strict abstraction boundary
Replace direct access to Dict's internal std::vector<Pair> with
encapsulated methods: size(), empty(), const iteration via
begin()/end(), appendPair(), markNonPOD(), and getRawVal().
This enables future changes to Dict's internal representation
without breaking callers.
Ref: rdkit/rdkit#9112
* Harden Dict::appendPair to take a populated Pair by move
appendPair(Pair&&) now auto-detects non-POD status via
RDValue::needsCleanup(), eliminating markNonPOD() and the
risk of dangling references or uninitialized entries.
needsCleanup() is placed next to destroy() on RDValue to
keep the POD/non-POD distinction in one place.
* Remove vestigial dictHasNonPOD param from streamReadProp
Both callers ignored the output. Non-POD detection is now handled
by Dict::appendPair via RDValue::needsCleanup().
* unbork java build
* Address PR review: bulk append, rename getRawVal, add custom data test
- Add Dict::append(vector<Pair>&&) for bulk insertion with reserve
- Use bulk append in streamReadProps to restore pre-allocation
- Rename getRawVal -> getRDValue per reviewer preference
- Add test verifying custom AnyTag data is destroyed through Dict lifecycle
* heed self-review
* don't manually implement vec.insert
* Add test: ExplicitBitVect round-trip through Dict serialization
Exercises the full streamWriteProps/streamReadProps path with an
ExplicitBitVect in an RDProps Dict, confirming the custom handler
is invoked and no memory is leaked (verified under valgrind).
* in anyTag test, assert destructors ran a specific number of times.
---------
Co-authored-by: bddap (Coding Agent) <andrew+bot@dirksen.com>
* implement consistency check
* add more consistency checks
* check direction consistency accross double bond
* clean up directions for non-stereo bonds
* fix counts for second from atom dirs; add check
* handle inconconsistent bond dirs
* add more tests, pubchem cases, and update existing
* drop statics
* fix typo
* make sourceBond arg const
* fix consistency check