mirror of https://github.com/rdkit/rdkit.git synced 2026-06-03 21:44:30 +08:00

Go to file

Nic Zonta b854399558 Spiro flipping (#9204 )

* add flipping of spiro rings as a way to solve clashes

* remove extra function

* add test file

* update coordgen parameters to allow for bond flipping

* fix failing tests

* Update Code/GraphMol/Depictor/EmbeddedFrag.h

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

* Update Code/GraphMol/Depictor/EmbeddedFrag.cpp

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

* Update Code/GraphMol/Depictor/EmbeddedFrag.cpp

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

* Update Code/GraphMol/Depictor/EmbeddedFrag.cpp

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

* [bot] Update molecular templates header (#9234)

Co-authored-by: github-actions[bot] <github-actions[bot]@noreply.github.com>

* Add some std::ranges support (#9218)

* initial ranges support for Atom/Bond iterators.
needs more testing

* support random access
test sort

more testing please

* compiles on windows

* fix size()
more testing
add some benchmarking

* disable benchmarking code by default

* do not allow modifying the graph through the iterators

---------

Co-authored-by: = <=>

* mention AI tools in the contrib guidelines (#9224)

* mention AI tools in the contrib guidelines

* response to review

---------

Co-authored-by: = <=>

* Add getSGroupDataLabels() to MolDraw2D_detail namespace (#9189)

Adds a new function MolDraw2D_detail::getSGroupDataLabels() that returns
the text and molecule-coordinate positions of DAT SGroup labels, using
the same placement logic as the drawing code. This allows external
renderers to display SGroup labels consistently with RDKit's placement.

Refactors DrawMol::extractSGroupData() to call getSGroupDataLabels()
internally, eliminating the duplicate FIELDDISP parsing and position
computation logic.

Closes #7829

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* MolDraw2D: configurable legend position and vertical side legends (Issue #9023) (#9183)

* Configurable legend position (Top/Left/Right/Bottom) and vertical text (GitHub #9023)

- Add LegendPosition enum and legendPosition, legendVerticalText to MolDrawOptions
- Support legend at Top, Left, Right, Bottom; vertical text for Left/Right
- Python: MolDrawOptions.legendPosition, .legendVerticalText; LegendPosition enum
- Python: MolToSVG() wrapper with legend/drawOptions; doc updates for MolToImage
- JSON: legendPosition (string), legendVerticalText (bool) in draw options
- C++ and Python tests; release note and Cartridge.md docs

* MolDraw2D: legend gutter for horizontal side legends; vertical side height fit

- Reserve horizontal gap between molecule and left/right horizontal legends
  (scale mol to molWidth-gutter, align toward legend strip).
- Position horizontal side legend by measured text width from partition edge.
- Vertical side legends: iterative scale so n*max_h+(n-1)*gap fits panel.
- Catch: long vertical side legend section.

* Update legend-position tests and review-driven cleanup

Use enum/default wording for legendPosition docs, move the lightweight Python test to Wrap, add regex-based placement checks (including horizontal side and vertical stacking), and refactor extractLegend helpers per style guidance.

* Fix MolDraw2D legend edge cases

* MolDraw2D: review follow-up (legend tests, bounds, DRY Top/Bottom)

* Update no-FT legend test coords

* Address PR review: document constants, remove release-note text, and simplify extra-padding logic

* make sorting more consistent (#9239)

* Cleanup/get atoms and bonds (#9243)

* Fix bug in inversion term for UFF, add finite difference checker. (#9228)

* Fix copyright

* Address review comments

Removed finite diff from RDKit headers

Used explicit coordinates

* If templates match, skip ring number check (#9217)

* remove ring mathcing for templates

* remove extra code

* remove empty lines

* fix build error

* Tautomer insensitive hash v2, E/Z and stereocenter-preservation (#9128)

* Tautomer insensitive hash v2, E/Z and stereocenter-preservation

* Preserve E/Z stereochemistry and stereocenters in TautomerHashv2

Simplify extension logic to better protect stereocenters connected via
single bonds to aromatic systems. Preserve E/Z stereo on exocyclic
double bonds to distinguish geometric isomers (e.g., E/Z hydrazones).

* add helper function to remove duplicated code

* Fix ring info and bond aromaticity handling in MolHash

- Add fastFindRings check in TautomerHashv2 before ring queries
- Set isAromatic consistent with bond type (true for AROMATIC bonds)
- Fix inverted condition in RegioisomerHash

* more consistent hashes regardless of stereo annotation

* Ensure that StereoGroups don't have duplicate atoms or bonds (#9258)

* check for duplicate atoms/bonds in StereoGroups

* explicit handling of duplicate stereogroup atoms in CTAB and CXSMILES parsers

---------

Co-authored-by: = <=>

* Add Getter functions to MMFF property python interface (#9254)

* Support using iterators with MolSuppliers (#9230)

* iterators for random-access MolSuppliers
add optional caching to SDMolSupplier

* add support to SmilesMolSupplier too
There is a lot of duplicate code between the random-access suppliers that would be worth trying to remove
but at the moment it looks like it would require multiple inheritance, and I think we want to avoid that

* add input iterators for ForwardSDMolSupplier()

* throw when calling begin() on a used supplier

* switch to use the spaceship operator

* init() should reset the mol cache

* Make SDMolSupplier and SmilesMolSupplier safe for multi-threaded reads

* add benchmarking

* add TDTMolSupplier support
improved testing
add benchmarks for parallel iteration
optional TBB support

* better const handling, add reverse iterators

doesn't look like const_iterator is possible since getting data from the underlyng supplier object is non-const

* improve docs
more usings
add reverse iterator to TDTMolSupplier

* tests only try execution::par when it is there

* fix typo

* more testing/demo

* remove accidentally added files

* review changes

* add default ctors

* disable a false-positive compiler warning
it is stupid to have to do this

---------

Co-authored-by: = <=>

* Pandastools improvements (#9251)

* Added automatic parsing functionality

* Added documentation

* Slightly changed check for gzip extension

* Apply suggestions from code review

Added small changes for readability

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

---------

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

* Add optional default_val parameter to GetProp()  (#9242)

* SHARED-12256: Add test and change function.

* SHARED-12256: Update to only wrapping changes.

* SHARED-12256: Parameterize tests.

* SHARED-12256: GetPropIfPresent changes.

* Revert "SHARED-12256: GetPropIfPresent changes."

This reverts commit f598f8c161.

* SHARED-12256: Make default the keyword in the boost wrappings.

* SHARED-12256: Overload function instead of using a sentinel.

* SHARED-12256: Extend GetProp changes.

* SHARED-12256: Add entry point for tests and fix tests.

* Extended fix for #9101 (#9255)

* fix extended boundary issue (3 mols)

* clang pass

* no change. retrigger CI for failed java test

there's a failing java test that seems to be failing by chance rather than by changes, as it depends on rng. this is just to retrigger the CI pipeline to confirm this

* no change. retrigger the CI (yet again)

* raw strings and removed garbage collector

* CIP labeller performance: Don't calculate auxiliary descriptors unnecessarily (#9171)

* CIP labeller: Don't calculate auxiliary descriptors unnecessarily

The first 3 rules (the constitutional rules) are pretty easy
to understand. After rule 3, we need to calculate auxiliary
stereo descriptors to break ties.

However, we _were actually_ calculating auxiliary stereodescriptors
for all centers! We should only need to calculate auxiliary
stereocenters for sites that are needed to break ties.

This cost time - it also caused errors if the auxiliary descriptors
needed a graph expansion, because bonds in the digraph might be
pointed in the wrong direction.

Example case PDB ID 4AXM
Before this commit, errored with "Could not calculate parity! Carrier mismatch"
after 14s. After this commit, completes successfully in 0.036s.
Labelled centers all match (for the centers that had labels in
the failure case).

Includes a test that I can imagine breaking with this optimization.
The reference labels are from before this change

* Ensure all "arms" of stereo bonds and atropisomer bonds are expanded

For tetrahedral centers, ranking using the constitutional rules
always expands as far as is needed (but no further). For SP2bond
and atropisomers, if the first side is not resolvable, the
second side is never visited.

If the constitutional rules don't resolve a side, we need to
label the auxiliary centers. It's important to label all
auxiliary centers that _will_ be visited, so we need to know
what centers will be visited.

This commit updates the label() call in SP2 and Atropisomer
bonds to always attempt to label both sides if using the
constitutional rule set.

The constitutional rules are cheap, and if they fail, we
always go on to the full rule set. It is not a savings to skip
the search on the second side if we're going to keep going
anyway!

Includes a test that reproduces Ricardo's example.

This has no measurable effect on performance relative to the
original solution

* If any parts of the center have been seen, label it.

I couldn't make an example hit this, but Ric is totally
theoretically right

* Greg's ranges suggestion #2

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

* any_of for container search

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

---------

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

* CIPLabeler performance: Store vector of bonds (#9250)

* CIPLabeler performance: Store vector of bonds

CIPLabelling refers to bonds by index over and over again. This
causes a measurable hit in performance in findConfigs() because
we iterate over a bitset of "allowed" bonds. For very large
molecules with many bonds, this can be a rate-limiting step!

This affects many PDB-sized structures.

2J3N goes from 0.7s to 0.25s with this change.

I had another example for which the findBondWithIdx() call was
taking 500ms of a 700ms call (after the performance update
in #9171 was implemented)

* yikes, XXL reserve

thanks, greg

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

---------

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

* Address PR #9204 review feedback

Implemented performance improvements suggested by @greglandrum:

1. Move cheap degree check to start of isSpiroCenter()
   - Early bailout eliminates ~95% of candidates immediately

2. Replace std::set with boost::dynamic_bitset<>
   - Faster set operations for ring membership tests
   - More efficient intersection using bitwise AND

3. Remove expensive PRECONDITION in flipAboutSpiroCenter()
   - Caller already validates spiro center, no need to check again

All tests pass (testDepictor: 7.85s).

* Use boost::dynamic_bitset in removeCollisionsBondAndSpiroFlip

Replaced std::set<unsigned int> with boost::dynamic_bitset<> for
spiro center caching in collision resolution:

- Changed spiroCenters from std::set to boost::dynamic_bitset
- Updated tryResolvingCollisionWithSpiroFlip() signature
- Replaced set.find() with bitset.test() for membership checks
- Replaced set.insert() with bitset.set() for marking spiro centers

Benefits:
- Faster membership tests (O(1) bit test vs O(log n) tree lookup)
- Better cache locality (contiguous bit array vs scattered nodes)
- Simpler code (no iterator comparisons)

All tests pass (testDepictor: 2.64s).

* remove unnecessary reformatting

* more unneeded formatting

* even more unecessary formatting

---------

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@noreply.github.com>
Co-authored-by: Chris Von Bargen <christopher.vonbargen@schrodinger.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Brandon Novy <142041993+Brandon-Cole@users.noreply.github.com>
Co-authored-by: Ricardo Rodriguez <ricrogz@users.noreply.github.com>
Co-authored-by: Kevin Boyd <kboyd@nvidia.com>
Co-authored-by: Eloy Félix <eloyfelix@gmail.com>
Co-authored-by: Marco Ballarotto <marco.ballarotto@icr.ac.uk>
Co-authored-by: Emily Rhodes <70823163+emilyrrhodes@users.noreply.github.com>
Co-authored-by: Raul Sofia <67133355+RaulSofia@users.noreply.github.com>
Co-authored-by: Dan Nealschneider <dan.nealschneider@schrodinger.com>

2026-06-03 05:56:04 +02:00

.azure-pipelines

Get things working with numpy 2.4 and pandas 3.0 (#9072 )

2026-02-04 12:06:21 +01:00

.github

swap to specific versions of other actions we use (#9308 )

2026-05-28 16:47:22 +02:00

build_support

Reformat Python code for 2023.03 release (#6294 )

2023-04-28 06:53:56 +02:00

Code

Spiro flipping (#9204 )

2026-06-03 05:56:04 +02:00

Contrib

refactor: improve readability and maintainability of AAP similarity code (#9277 )

2026-05-14 06:52:12 +02:00

Data

Fix sulfonamide SMARTS in fragment descriptors. (#9019 )

2025-12-22 07:23:33 +01:00

Docs

MolDraw2D: configurable legend position and vertical side legends (Issue #9023 ) (#9183 )

2026-04-16 04:59:00 +02:00

External

Fix STEREOANY (wavy bond) loss during InChI roundtrip (#9315 )

2026-06-02 14:41:00 +02:00

Projects

Some cmake cleanup work (#7720 )

2024-08-16 17:11:31 +02:00

rdkit

Fix STEREOANY (wavy bond) loss during InChI roundtrip (#9315 )

2026-06-02 14:41:00 +02:00

rdkit-stubs

Build: forward-slash Python3_EXECUTABLE when generating rdkit-stubs (#9318 )

2026-06-02 08:23:27 +02:00

Regress

Modernization of some substructure code (#8450 )

2025-05-12 06:33:25 +02:00

Scripts

Add more pyi patches, 2026-03 (#9214 )

2026-04-06 11:51:49 +02:00

Web/RDExtras

Reformat Python code for 2023.03 release (#6294 )

2023-04-28 06:53:56 +02:00

.clang-format

Expose molzip functionality to MinimalLib (#7959 )

2024-11-12 17:16:14 +01:00

.clang-tidy

Add Molecular Interaction Fields (#7993 )

2024-12-14 17:08:43 +01:00

.gitattributes

Handle DOS files in SynthonSpaceSearch (#8075 )

2024-12-09 17:29:17 +01:00

.gitignore

Extended fix for #9101 (#9255 )

2026-05-06 06:10:37 +02:00

.travis.yml

Update to use the travis Xenial environment (#2200 )

2018-12-25 21:58:52 -05:00

azure-pipelines.yml

Get things working with numpy 2.4 and pandas 3.0 (#9072 )

2026-02-04 12:06:21 +01:00

CMakeLists.txt

Build: tag rdkitpython install rules with COMPONENT python (#9288 )

2026-05-28 13:24:54 +02:00

CONTRIBUTING

Getting Started with Contributing to RDKit (#7813 )

2024-12-06 06:14:08 +01:00

CTestConfig.cmake

Mem checkup (#3083 )

2020-04-17 17:48:58 +02:00

CTestCustom.ctest.in

Support for parsing/writing SGroups in SD Mol files. (#2138 )

2019-01-22 15:42:27 +01:00

INSTALL

Fixes #679

2015-11-26 02:34:33 +01:00

license.txt

Update the license for Github to recognize (#3159 )

2020-05-12 09:58:58 +02:00

rdkit-config.cmake.in

make ringdecomposerlib a mandatory dependency (#9209 )

2026-03-27 18:17:27 +01:00

rdkitpython-config.cmake.in

export the targets with a python dependency to a different config file (#7914 )

2025-02-19 05:53:30 +01:00

README.md

massive simplification of README.md (#7831 )

2024-09-25 11:53:27 +02:00

ReleaseNotes.md

Do deprecations for 2026.09 release (#9213 )

2026-05-29 10:19:17 +02:00

setup.cfg

Issue1071/yapf (#1078 )

2016-09-23 04:58:46 +02:00

README.md

RDKit

What is it?

The RDKit is a collection of cheminformatics and machine-learning software written in C++ and Python.

BSD license - a business friendly license for open source
Core data structures and algorithms in C++
Python 3.x wrapper generated using Boost.Python
Java and C# wrappers generated with SWIG
JavaScript (generated with emscripten) and CFFI wrappers around important functionality
2D and 3D molecular operations
Descriptor and Fingerprint generation for machine learning
Molecular database cartridge for PostgreSQL supporting substructure and similarity searches as well as many descriptor calculators
Cheminformatics nodes for KNIME
Contrib folder with useful community-contributed software harnessing the power of the RDKit

Installation and getting started

If you are working in Python and using conda (our recommendation), installation is super easy:

$ conda install -c conda-forge rdkit

You can then take a look at our Getting Started in Python guide.

More detailed installation instructions are available in Docs/Book/Install.md.

Documentation

Available on the RDKit page and in the Docs folder on GitHub

The RDKit blog often has useful tips and tricks.

Support and Community

If you have questions, comments, or suggestions, the best places for those are:

GitHub discussions
The mailing list

If you've found a bug or would like to request a feature, please create an issue

We also have a LinkedIn group

We have a yearly user group meeting (the UGM) where members of the community do presentations and lightning talks on things they've done with the RDKit. Materials from past UGMs, which can quite useful, are also online:

License

Code released under the BSD license.

Languages

C++ 69.6%

Python 15.3%

PLSQL 3.6%

CMake 2.8%

C 2.5%

Other 6.1%