rdkit

mirror of https://github.com/rdkit/rdkit.git synced 2026-06-04 21:54:27 +08:00

Author	SHA1	Message	Date
Yakov Pechersky	c6cabf4153	Speed-up tautomer canonicalization, no API changes (#9134 ) * Speed up tautomer canonicalization by deferring on SSSR calc * Lazy kekulization for tautomer enumeration Defer kekulization of tautomers until they are actually needed for transform matching. This avoids creating kekulized copies for: 1. The initial tautomer (until first iteration) 2. New tautomers that may never be processed (if enumeration ends early) The Tautomer class now supports lazy initialization of the kekulized form via getKekulized() method. Performance improvement: ~7% additional speedup (total ~22-24% from baseline) * Use count-only substructure matching in tautomer scoring * Add SubstructMatchCount regression test * MolStandardize: reduce enumerate overhead * MolStandardize: avoid per-tautomer ring recomputation * Atom: cache PeriodicTable pointer in valence calcs * Atom: reuse PeriodicTable in getEffectiveAtomicNum * PeriodicTable: add atomic fast path for getTable * GraphMol: reduce ROMol copy reallocations * MolStandardize: use quickCopy for per-match product copies Use RWMol(kmol, true) in tautomer enumeration to avoid copying properties/bookmarks/conformers for each candidate. This reduces deep-copy overhead without changing chemistry. MolStandardize: pre-filter scoring patterns by element/connectivity For tautomer scoring, pre-compute which SubstructTerms are relevant for a given input molecule. Since tautomerization only moves H atoms and changes bond orders (never creates/destroys heavy-atom bonds), patterns requiring missing elements or connectivity can be skipped for all tautomers of that molecule. Two-stage filtering: 1. Element check: skip patterns requiring atoms not in the molecule 2. Connectivity check: skip patterns whose bond-order-agnostic structure doesn't match the input molecule's connectivity This reduces the number of VF2 substructure calls per tautomer from 12 to typically 3-5, depending on the molecule's composition. * MolStandardize: preserve molecule properties for canonical tautomer Copy molecule properties from the original input to the canonical tautomer result. Since quickCopy during enumeration skips d_props to avoid overhead, extended SMILES data like link nodes (LN) was lost. This restores them on the final result. * TautomerQuery: preserve molecule properties (e.g. link nodes) in tautomers TautomerQuery::fromMol() uses TautomerEnumerator::enumerate() which uses quickCopy for performance. This doesn't copy molecule properties like _molLinkNodes. Without this fix, XQMol output would lose link node extensions in the SMILES. Copy properties from the original query molecule to all enumerated tautomers before constructing the TautomerQuery. This preserves extended SMILES data without impacting enumeration performance. * MolStandardize: use parallel iteration and cache bond lookups Replace O(n) getAtomWithIdx/getBondWithIdx calls with parallel iteration over atom/bond ranges in canonicalizeInPlace and enumerate. Cache bond lookups in setTautomerStereoAndIsoHs to avoid repeated O(n) searches. * perf: add specialized matchers for simple tautomer scoring patterns Replace VF2 graph matching with O(n) loops for 6 simple patterns: - countDoubleOrAromaticBonds: C=O, N=O, P=O patterns - countMethyls: [CX4H3] methyl groups - countCarbonDoubleHetero: [C]=[/home/dcvuser/rdkit;Code/GraphMol/MolStandardize/Tautomer.h] aliphatic C=hetero - countAromaticCarbonExocyclicN: [c]=aromatic C=exocyclic N Complex patterns (benzoquinone, oxim, guanidine, aci-nitro) still use VF2. Combined with the pre-filtering optimization, this achieves ~3.7x speedup (~2500ms vs ~9300ms original) for tautomer canonicalization. * Fix tautomer canonicalize dropping conformers from quickCopy quickCopy (RWMol(mol, true)) skips conformers, so tautomer enumeration products lose 2D/3D coordinates. This causes InChI generation to omit the /b (double bond E/Z stereo) layer, since E/Z is derived from atomic coordinates. Fix: copy conformers from the original molecule onto the canonical tautomer after pickCanonical in TautomerEnumerator::canonicalize(). Tests: SMILES-based E/Z check in testTautomer.cpp, molblock-based conformer preservation check in catch_tests.cpp. add test on canonicalize losing stereo * add regression test for exocyclic C=C tautomer canonicalization The getTautomerStateKey() pre-filter (commit 2595ef748) can falsely deduplicate distinct tautomers when their atom-index-ordered state patterns happen to match, leading canonicalize() to pick the wrong canonical form for molecules with STEREOTRANS-pinned exocyclic C=C bonds after RemoveHs. Test verifies that O=C(CC1=CC2=CC=COC2)NC1=O canonicalizes to the exocyclic form O=C1CC(=CC2=CC=COC2)C(=O)N1, not the endocyclic form O=C1C=C(C=C2CC=COC2)C(=O)N1. Currently expected to FAIL until the state key dedup bug is fixed. * MolStandardize: expand tautomer connectivity SMARTS * MolStandardize: scope tautomer pattern enum * MolStandardize: trim tautomer pattern enum * MolStandardize: use symmetric ring scoring	2026-03-31 06:42:40 +02:00
Greg Landrum	a9477d2694	Modernization of some substructure code (#8450 ) * use std::span for substruct match callbacks This removes a copy from every evaluation of potential matches * some cleanup/modernization * some modernization * deprecate chiralAtomCompat * small optimization * remove naked pointers * improve new_timings.py script * changes suggested in review * response to review * response to review	2025-05-12 06:33:25 +02:00
Ricardo Rodriguez	35cb7809f2	Fixes #8492 (#8493 )	2025-05-08 08:05:13 +02:00
Greg Landrum	da6cd73168	Run clang-format across everything (#7849 ) * run clang-format-18 across Code/.cpp and Code/.h * run clang-format-18 across External	2024-09-26 13:39:02 +02:00
Brian Kelley	c8cd4e7c20	consolidate numeric vectors (#7792 ) * Speed up boost vector iterators by 300x * Add vector testing code * Update test * Remove GetPosition notebook * Move all wrapped int vectors to top level * Grab MatchTypeVect from rdBase * Actually wrap the vectors	2024-09-18 07:43:17 +02:00
Paolo Tosco	2b4202867e	Add Python modules to generate stubs and automatically patch docstrings (#6919 ) * - added gen_rdkit_stubs Python module to generate rdkit-stubs - added patch_rdkit_docstrings Python module to patch existing C++ sources to fix docstrings missing self parameter and add named parameters taken from C++ signatures where possible - added rdkit-stubs/CMakeLists.txt to build rdkit-stubs as part of the RDKit build - added an option to CMakeLists.txt to enable building rdkit-stubs as part of the RDKit build (defaults to OFF) * fixed CMakeLists.txt, rdkit-stubs/CMakeLists.txt and a doctest * - added missing cmp_func parameter - fixed case with overloads with optional parameters - do not trim params if expected_param_count == -1 - add dummy parameter names if we could not find any - keep into account member functions when making up parameter names - address __init__ and make_constructor __init__ functions - fix incorrectly assigned staticmethods * patched sources * address residual few remarks --------- Co-authored-by: ptosco <paolo.tosco@novartis.com>	2023-11-30 04:54:18 +01:00
Greg Landrum	8892fb160a	Fix minimal build, allow building without boost::serialization (#6932 ) * make sure that we can build without boost iostreams or seralization adds some "private" variables on the python side to check for these compilation flags * get out minimal cmake version correct * get minimallib js building installs an up-to-date cmake also updates the version of boost being used for the minimallib adds extra argument to allow the repo to be specified	2023-11-23 05:57:05 +01:00
Greg Landrum	2957ab4576	switch to catch2 v3 (#6898 ) * switch to catch2 v3 Fixes #6894 * fix a couple of problems noticed in the CI builds * more warning cleanup * changes in response to review	2023-11-15 06:45:42 +01:00
Greg Landrum	f797113a16	cmake cleanup (#6814 ) * add RDKIT_CFFI_STATIC option minimallib cmake cleanup * clean up a lot of boost::iostreams nonsense * find_package(boost cleanup * update the swig wrappers * updates to psql * get the Qt demo working again * fix? coordgen * only use std::regex in moldraw2d test this is consistent with the other tests * cleanup the serialization stuff too	2023-11-10 15:32:54 +01:00
Gareth Jones	b46fc6e28b	RGD with matching on tautomers of core (#6611 ) * Support tautomer queries in RGD * Continuing RGD and tautomer development * Python and C# tests * Python and C# tests * C# test * Typo fix * For cire tautomer query update properties instead of full sanitization * Added query comment * Code review change * Support Enumeration of input cores * Mol enumeration test * Remove useNormalMatch from RGroupDecomp * Added comments for handling tautomeric core * Added comments for handling tautomeric core	2023-08-29 08:50:14 +02:00
Greg Landrum	b325b3a9bb	Support TautomerQuery and MolBundle queries in the cartridge (#6393 ) * framework for extended query. serialization works to/from text doesn't work * first pass at getting substructure search working basic tests improved error handling (try not to take down the server thread!) * add serialization to MolBundle * we really need to pickle mol properties * basic support for molbundle including substructure search * tautomer and molbundle queries to JSON * remove debug msg * cleanup debug initial index steps (not tested) * remove indexing stuff since it wasn't working will try to come back to that * add xqm to update script * add c++ testing for molbundle serialization * add serialization of molbundles to python interface * support expanding molbundles to arrays of tautomer queries * edge cases Signed-off-by: greg landrum <greg.landrum@gmail.com> * change in response to review * a bunch of updates * make sure the mol props needed for XQMs are being serialized * update update script * fix binary string output from ExtendedQueryMols in python * tautomer queries should serialize properties * more testing never hurts * combo of generic groups and generalized queries works * Update Code/PgSQL/rdkit/adapter.cpp Co-authored-by: Paolo Tosco <paolo.tosco.mail@gmail.com> * Update Code/PgSQL/rdkit/adapter.cpp Co-authored-by: Paolo Tosco <paolo.tosco.mail@gmail.com> * Update Code/PgSQL/rdkit/adapter.cpp Co-authored-by: Paolo Tosco <paolo.tosco.mail@gmail.com> * Fix weird quotes? --------- Signed-off-by: greg landrum <greg.landrum@gmail.com> Co-authored-by: Paolo Tosco <paolo.tosco.mail@gmail.com>	2023-08-23 06:32:06 +02:00
Ric	880a8e5725	Reformat Python code for 2023.03 release (#6294 ) * run yapf * run isort --------- Co-authored-by: Greg Landrum <greg.landrum@gmail.com>	2023-04-28 06:53:56 +02:00
Greg Landrum	79e8295586	Support Python 3.11 (#5994 ) * remove some more deprecated numpy stuff * workaround for changes to random.shuffle in python 3.11 * fix pickling of rdkit mols in python 3.11 * add py311 build to CI * update py311 CI * remove qt for py311 for the moment * only use the new code with pyversion >=3.11 * use the new logic for all pickle_suites * need to work with older py too	2023-01-24 18:16:26 +01:00
Ric	0d8ad69541	fix warnings (#5561 )	2022-09-14 06:36:42 +02:00
Greg Landrum	594c58f86c	make the catch tests build faster (#5284 ) * reorg the catch tests the goal here is to make the builds faster * make that easier	2022-05-17 04:39:33 +02:00
Greg Landrum	b797137341	Make TautomerQuery serializable (#5248 ) * add serialization to the TautomerQuery * add to python wrapper * changes in response to review * put TautomerQueryCanSerialize in Python wrappers * include serialization build dependency	2022-05-04 14:31:50 +02:00
Eisuke Kawashima	27f711a658	Run clang-tidy (readability-braces-around-statements) (#4977 ) https://github.com/rdkit/rdkit/pull/3024#discussion_r526549843	2022-03-10 08:00:10 +01:00
Paolo Tosco	29ebf45f17	remove dead code (#4739 ) Co-authored-by: Tosco, Paolo <paolo.tosco@novartis.com>	2021-11-29 08:00:21 +01:00
Eisuke Kawashima	11532089de	Run clang-format against cpp (#4358 )	2021-10-20 04:25:27 +02:00
Ric	878c4c7ec0	save one search (#4566 )	2021-09-28 04:37:24 +02:00
Ric	6db202aa0d	Improve performance of removing substruct/tautomer duplicates (#4560 ) * improve removeDuplicates performance * improve removeTautomerDuplicates performance * use std::set	2021-09-25 15:45:55 +02:00
Greg Landrum	3193b76d8c	cleanup some compiler warnings (#4521 ) * cleanup some clang warnings * get rid of some VC++ warnings	2021-09-16 04:34:40 +02:00
Ric	9aa949576a	Addresses #4425 (#4426 ) * refactor converter registration checking * some more converters refactoring & cleanup	2021-08-20 11:23:39 +02:00
Eisuke Kawashima	48f4f3ee82	Run clang-tidy (modernize-pass-by-value) (#4224 )	2021-06-14 06:57:08 +02:00
Eisuke Kawashima	78aac3c1bc	Run clang-format against header files (#4143 )	2021-06-08 07:57:51 +02:00
Greg Landrum	f5a54af475	A collection of MolStandardize improvements (#4118 ) * Swap to using a data structure for default normalization parameters * bring the default fragment data into the code too * cleanup * add reionizer parameters via data change fragment parse failures to ValueErrorExceptions * tautomer parameters in the code * got a little over-enthusiastic in that last cleanup * use boost::flyweight to cache normalization and charge data params * a bit more cleanup * support reading params from JSON * fragments from JSON single-call for fragment removal * add a one-liner for the canonical tautomer * quick refactor * Fixes #4115 * complete the parents * docs * move the definitions to a namespace and make them const * see if switching to c++14 fixes the CI compile problems with g++ 5.5 * somewhat uglier way of solving the initalizer list problem	2021-05-19 09:11:23 +02:00
Paolo Tosco	106e9f7c37	Normalize line endings in source code files (#4104 ) * set all source code files to have native line endings * normalized all source code line endings Co-authored-by: Paolo Tosco <paolo.tosco@novartis.com>	2021-05-13 14:31:39 +02:00
Brian Kelley	c8aa10c80f	Add tautomer query to the substructlibrary (#3808 ) * Fixes #3797 * [WIP] Add tautomer queries to the substruct library * Add TautomerQuery to CMake * Add missing TautomerQuery functions, python wrapper and tests * Add python wrappers for Substruct Library Tautomer Queries * Explictly label non-const pattern function now that we have both * Use boost::shared_ptr not std::shared_ptr * Fix java builds * One more try to fix java builds * Fix Java Tests * Run clang format * Reenable tests * Fix annoyingly stupid bug and annoying commit of debug code * Fix documentation * reenable ifdef threadsafe check * Throw warning and perform tautomer search instead of bailing with incorrect fingerprints * Simplfy api with templates * Fix SubstructLibrary java issues * minor API cleanup * simplify the SWIG wrappers Co-authored-by: Brian Kelley <bkelley@relaytx.com> Co-authored-by: greg landrum <greg.landrum@gmail.com>	2021-03-05 04:56:20 +01:00
Brian Kelley	d9033e4626	Fixes #3821 copy constructor by making the template molecule a shared… (#3822 ) * Fixes #3821 copy constructor by making the template molecule a shared pointer * Pushed a commit by accident, reverting * Copy constructor now does a deep copy * Add operator= test and ensure deep copies of template * Update Code/GraphMol/TautomerQuery/catch_tests.cpp Co-authored-by: Greg Landrum <greg.landrum@gmail.com> * Remove extraneous .get()'s * Add better testing names for catch test Co-authored-by: Greg Landrum <greg.landrum@gmail.com>	2021-02-23 07:41:59 +01:00
Paolo Tosco	527a7adf99	Some work on TautomerEnumerator (#3327 ) * - Added a TautomerEnumerator constructor which allows passing CleanupParameters - Added three configurable parameters to CleanupParameters - Added a callback to TautomerEnumerator - Fixed a bug where the same tautomer could be mapped by both isomeric and non-isomeric SMILES - TautomerEnumerator::enumerate() now returns a TautomerEnumeratorResult and does not take dynamic_bitset pointers as optional parameters - Added a missing transform from the Sitzmann paper - General code cleanup and optimization * - TautomerEnumeratorResult is now iterable in both C++ and Python - further optimizations - implemented a TautomerEnumerator.PickCanonical() Python wrapper - added C++ and Python accessors to SMILES and SmilesTautomerMap * - make sure the number of tautomers reported by rdLogger is correct and definitive * make sure that if N maxTautomers are requested, N tautomers are returned if the theoretical number of tautomers is M>N * avoid that sulfonic acids hit the formamidinesulfinic acid tautomerisation rule * offer an option to allow the old API to still be used. * Changes in response to review and following discussion with Gareth and Greg * - made TautomerEnumeratorResult an enum class (was a plain C enum) - made TautomerEnumeratorResult::const_iterator a bidirectional_iterator - added tests to fully probe the TautomerEnumeratorResult::const_iterator functionality * - change the difference_type definition - added tests for the above * - cosmetic change to improve code readability Co-authored-by: greg landrum <greg.landrum@gmail.com>	2020-09-24 17:00:03 -04:00
jones-gareth	9a864f4238	Sgroup (#3390 ) * Changes to use SubstanceGroups in Java * Forgot to add SWIG file * Java test for SubstanceGroup wrappers * Added RDKit boilerplate	2020-09-09 04:59:08 +02:00
Greg Landrum	c0a62388a2	switch to using target_compile_definitions instead of add_definitions (#3350 ) * switch to using target_compile_definitions instead of add_definitions * missed one	2020-08-21 04:49:07 +02:00
jones-gareth	aa4d5dc22c	Fixes for aromatic bond fuzzy queries (#3328 ) * C# wrapper for fragmentMolOnBonds * Fix failing tautomer query test * Fix ChemTransforms.i * SmartsWriter fix	2020-08-10 05:00:19 +02:00
jones-gareth	21a8a263bd	Tautomer search (#3205 ) * TautomerQuery class * working test * Comment header * Merge with master. Greg's suggestions. More tests. Python wrapper * Updated Pattern Fingerprints to merge with master. Reset email * Java/C# wrappers. Java test * Java/C# wrappers. Java test * Java/C# wrappers. Java test * Greg suggestions of 6_2_2020 * Explicit types in Java TautomerQueryTests class * Update Code/GraphMol/QueryOps.h Co-authored-by: Greg Landrum <greg.landrum@gmail.com> * get windows dll builds working * Removed tautomer query wrappers from RDKit namespace * Fixes from evaluation * Template molecule identification fix. Greg's suggestion * Final check search functor for evaluating template matches as they are found Co-authored-by: Gareth Jones <gjones@glysade.com> Co-authored-by: Greg Landrum <greg.landrum@gmail.com>	2020-06-24 17:27:40 +02:00

34 Commits