rdkit

mirror of https://github.com/rdkit/rdkit.git synced 2026-06-04 21:54:27 +08:00

Author	SHA1	Message	Date
Yakov Pechersky	c6cabf4153	Speed-up tautomer canonicalization, no API changes (#9134 ) * Speed up tautomer canonicalization by deferring on SSSR calc * Lazy kekulization for tautomer enumeration Defer kekulization of tautomers until they are actually needed for transform matching. This avoids creating kekulized copies for: 1. The initial tautomer (until first iteration) 2. New tautomers that may never be processed (if enumeration ends early) The Tautomer class now supports lazy initialization of the kekulized form via getKekulized() method. Performance improvement: ~7% additional speedup (total ~22-24% from baseline) * Use count-only substructure matching in tautomer scoring * Add SubstructMatchCount regression test * MolStandardize: reduce enumerate overhead * MolStandardize: avoid per-tautomer ring recomputation * Atom: cache PeriodicTable pointer in valence calcs * Atom: reuse PeriodicTable in getEffectiveAtomicNum * PeriodicTable: add atomic fast path for getTable * GraphMol: reduce ROMol copy reallocations * MolStandardize: use quickCopy for per-match product copies Use RWMol(kmol, true) in tautomer enumeration to avoid copying properties/bookmarks/conformers for each candidate. This reduces deep-copy overhead without changing chemistry. MolStandardize: pre-filter scoring patterns by element/connectivity For tautomer scoring, pre-compute which SubstructTerms are relevant for a given input molecule. Since tautomerization only moves H atoms and changes bond orders (never creates/destroys heavy-atom bonds), patterns requiring missing elements or connectivity can be skipped for all tautomers of that molecule. Two-stage filtering: 1. Element check: skip patterns requiring atoms not in the molecule 2. Connectivity check: skip patterns whose bond-order-agnostic structure doesn't match the input molecule's connectivity This reduces the number of VF2 substructure calls per tautomer from 12 to typically 3-5, depending on the molecule's composition. * MolStandardize: preserve molecule properties for canonical tautomer Copy molecule properties from the original input to the canonical tautomer result. Since quickCopy during enumeration skips d_props to avoid overhead, extended SMILES data like link nodes (LN) was lost. This restores them on the final result. * TautomerQuery: preserve molecule properties (e.g. link nodes) in tautomers TautomerQuery::fromMol() uses TautomerEnumerator::enumerate() which uses quickCopy for performance. This doesn't copy molecule properties like _molLinkNodes. Without this fix, XQMol output would lose link node extensions in the SMILES. Copy properties from the original query molecule to all enumerated tautomers before constructing the TautomerQuery. This preserves extended SMILES data without impacting enumeration performance. * MolStandardize: use parallel iteration and cache bond lookups Replace O(n) getAtomWithIdx/getBondWithIdx calls with parallel iteration over atom/bond ranges in canonicalizeInPlace and enumerate. Cache bond lookups in setTautomerStereoAndIsoHs to avoid repeated O(n) searches. * perf: add specialized matchers for simple tautomer scoring patterns Replace VF2 graph matching with O(n) loops for 6 simple patterns: - countDoubleOrAromaticBonds: C=O, N=O, P=O patterns - countMethyls: [CX4H3] methyl groups - countCarbonDoubleHetero: [C]=[/home/dcvuser/rdkit;Code/GraphMol/MolStandardize/Tautomer.h] aliphatic C=hetero - countAromaticCarbonExocyclicN: [c]=aromatic C=exocyclic N Complex patterns (benzoquinone, oxim, guanidine, aci-nitro) still use VF2. Combined with the pre-filtering optimization, this achieves ~3.7x speedup (~2500ms vs ~9300ms original) for tautomer canonicalization. * Fix tautomer canonicalize dropping conformers from quickCopy quickCopy (RWMol(mol, true)) skips conformers, so tautomer enumeration products lose 2D/3D coordinates. This causes InChI generation to omit the /b (double bond E/Z stereo) layer, since E/Z is derived from atomic coordinates. Fix: copy conformers from the original molecule onto the canonical tautomer after pickCanonical in TautomerEnumerator::canonicalize(). Tests: SMILES-based E/Z check in testTautomer.cpp, molblock-based conformer preservation check in catch_tests.cpp. add test on canonicalize losing stereo * add regression test for exocyclic C=C tautomer canonicalization The getTautomerStateKey() pre-filter (commit 2595ef748) can falsely deduplicate distinct tautomers when their atom-index-ordered state patterns happen to match, leading canonicalize() to pick the wrong canonical form for molecules with STEREOTRANS-pinned exocyclic C=C bonds after RemoveHs. Test verifies that O=C(CC1=CC2=CC=COC2)NC1=O canonicalizes to the exocyclic form O=C1CC(=CC2=CC=COC2)C(=O)N1, not the endocyclic form O=C1C=C(C=C2CC=COC2)C(=O)N1. Currently expected to FAIL until the state key dedup bug is fixed. * MolStandardize: expand tautomer connectivity SMARTS * MolStandardize: scope tautomer pattern enum * MolStandardize: trim tautomer pattern enum * MolStandardize: use symmetric ring scoring	2026-03-31 06:42:40 +02:00
Greg Landrum	b2f1eae1c3	Do not add a `__computedProps` property to molecules when initializing them (#8931 ) The rest of the code already adds the property if/when it is needed, so there's no need to add it to every molecule.	2025-11-18 19:09:36 -05:00
Ricardo Rodriguez	9a4b3e2fc6	Implements #8873 (#8904 ) * merge ABS groups on setStereoGroups * warn/fail on multiple ABS groups on strict parsing * add a test for setStereoGroups * add a test for multiple ABS groups in mol blocks * Drop the warning	2025-11-04 16:10:56 +01:00
Ricardo Rodriguez	7b7a8a4e17	Refactor iostreams includes (#8846 ) * refactor iostreams includes * restore ostream to MonomerInfo.cpp	2025-10-08 16:08:01 +02:00
Greg Landrum	9d4afd0e08	add clearPropertyCache() (#8533 )	2025-05-18 08:14:05 +02:00
Greg Landrum	6664dd7fe4	Some modernization of core GraphMol classes (#7228 ) * simple modernization * more * done with RWMol for this pass * the ROMol.cpp variant * Atom * minor change to bond * simplify Conformer * monomerinfo, queryatom, querybond queryatom and querybond cpp files still need to be done * typos * revert a dumb change * suggestion from review	2024-03-17 06:04:04 +01:00
tadhurst-cdd	d5d4d194ec	atropisomer handling added (#6903 ) * atropisomer handling added * fixed non-used variables, linking directives * BOOST LIB start/stop fixes, linking fix * Fixes for RDKIT CI errors * minimalLib fix * changed vector<enum> for java builds * check for extra chars in CIP labeling * removed wrong deprecated message * fix ostrstream output error? * restored _ChiralAtomRank to lowercase first letter * changes for merged master * Fixed catch label for new Catch package * update expected psql results * get swig wrappers building * restore MolFileStereochem to FileParsers * fix java wrapper for reapplyMolBlockWedging * some suggestions * move a couple functions out of Bond * Merge branch 'master' into pr/atropisomers2 * merged master * Renamed setStereoanyFromSquiggleBond * atropisomers in cdxml, rationalize atrop wedging, stereoGroups in drawMol * fix for CI build * attempt to fix java build in CI * attempt to fix java build in CI #2 * New routine to remove non-explicit 3D-geneated chirality * changed to use pair for atrop atoms and related bonds * Changes as per PR reviews * PR review respnses * PR review reponse - more * Fix merge from master * fixing java ci after merge * Updated the help doc for atripisomers * update the atropisomer docs * improve the images * add the source CXSMILES --------- Co-authored-by: greg landrum <greg.landrum@gmail.com>	2023-12-22 04:58:18 +01:00
Greg Landrum	a7c781c107	Some small cleanups from the UGM Hackathon (#6744 ) * move definition of a couple global constants from a .h to a .cpp * careful removal of some redundant atom PRECONDITIONS * careful remove of some redundant ROMol PRECONDITIONS a bit of additional cleanup * optimization masquerading as modernization * some more tidying * a bit more atom cleanup * change in response to review	2023-10-05 06:13:18 +02:00
Richard Gowers	4db63b8ec1	Issue 6411 ROMol hasquery (#6739 ) * added ROMol::hasQuery * python bindings for Mol.HasQuery * at least I checked that my Python tests were running... * hasQuery use C++11 range iterators	2023-09-25 13:23:37 +02:00
Ric	d033aee043	Optionally forward Enhanced Stereo Group ids (#6560 ) * add id members to StereoGroup class * add optional read id argument to StereoGroup constructors * add functions forward Stereo Group Ids and assign the missing ones * update ops updating stereogroups to forward read id * update CX Smiles to parse/write stereogroup ids * Add test cases for stereo group id forwarding/canonicalization * update mol block (V3K only) to parse/write stereogroup ids * update pickling to parse/write stereogroup ids * update cdxml parser to store stereogroup ids * update mol interchange to parse/write stereogroup ids * update draw code with new stere group ids * update test * add some tests * Update Code/GraphMol/Wrap/rdmolfiles.cpp Co-authored-by: Greg Landrum <greg.landrum@gmail.com> * Update Code/GraphMol/Wrap/rdmolfiles.cpp Co-authored-by: Greg Landrum <greg.landrum@gmail.com> * Update Code/GraphMol/Canon.cpp Co-authored-by: Greg Landrum <greg.landrum@gmail.com> * Update Code/GraphMol/SmilesParse/CXSmilesOps.cpp Co-authored-by: Greg Landrum <greg.landrum@gmail.com> * review --------- Co-authored-by: Greg Landrum <greg.landrum@gmail.com>	2023-07-27 18:53:40 +02:00
Ric	0e7e44c9f4	Fix some minor leaks (#6029 ) * fix deserializing leaks ringinfo * fix leak in FingerprintGenerator * fix leak in MolStandardize test	2023-01-31 17:30:34 +01:00
Ric	a19566c2c8	add info to debug (#5526 )	2022-09-06 13:03:08 +02:00
Greg Landrum	a354f2db62	add boost::serialization support to ROMol (#5249 ) * add boost::serialization support to ROMol * add RWMol test * get the windows DLL builds working * switch FilterCatalog and SubstructLibrary to serialize ROMols * an actual solution to the windows dll problem * the FilterMatchers stuff was not working	2022-05-07 11:11:53 +02:00
Greg Landrum	caaa7406be	Fixes #4127 (#4129 ) Also adds fixes for some related problems I noticed while fixing this one.	2021-05-18 15:39:15 +02:00
Greg Landrum	af3bb3e78b	Allow partial deserialization of molecules (#4040 ) * make pickling/depickling conformers optional * make de-pickling properties optional * support the new options in molecule ctors * update doctest	2021-04-24 07:22:55 +02:00
Greg Landrum	2e3f31990d	Allow batch editing of molecules: removal only (#3875 ) * backup * simple first pass, passes all tests * cleanup a bunch of existing uses * ensure that we can safely add atoms/bonds while in edit mode * add context manager on python side * handle exceptions properly in those * changes in response to review	2021-03-11 05:10:43 +01:00
Ric	703fe5a225	Remove boost::foreach from public headers (#3820 ) * remove include from headers * update implementation files * completely remove BOOST_FOREACH (#7) * convert those changes to use auto * get rid of all usage of BOOST_FOREACH Co-authored-by: Greg Landrum <greg.landrum@gmail.com>	2021-02-17 14:15:48 +01:00
Paolo Tosco	4084284ba5	Make better use of strictParsing for SGroups (#3705 ) * - eliminate some documentation ambiguity about the role of the strictParsing flag - fix some inconsistencies between SGroup parsing function prototype declarations and implementations - add a workaround for accepting malformed V2000 'M SAP' entries affecting older version of MarvinJS (only if strictParsing is set to false) - if strictParsing is set to false, malformed V2000/V3000 SGroups are ignored rather than causing the parsing to fail - fix a couple typos in warnings * changes in response to review	2021-01-14 10:29:41 +01:00
Greg Landrum	a9010da8a4	Small bug fixes and cleanups from fuzz testing (#3299 ) * fix ossfuzz issue 24074 * fix ossfuzz issue 23896 * switch to throw exceptions when reading ints/floats * remove extraneous benchmarking code * change type of AH query * confirm an invariant while finding rings * no sense in adding these tests to github * switch to use fail() instead of failbit switch to acceptSpaces by default	2020-07-22 16:57:31 +02:00
Greg Landrum	95613b6279	Allow SubstanceGroups to survive molecule edits (#3170 ) * Progress on #3168 * Fixes #3167 * Fixes #3169 * deal with CBONDS too * test PATOMS * Fixes #3175 * a bit of code simplification and test updates still needs more testing * more testing * handle s-group hierarchy also a couple of other changes in response to the review * add forgotten test file * changes in response to review	2020-05-19 17:35:08 +02:00
Greg Landrum	d41752d558	run clang-tidy with readability-braces-around-statements (#2899 ) * run clang-tidy with readability-braces-around-statements clang-format the results clean up all the parts that clang-tidy-8 broke * fix problem on windows	2020-01-25 14:19:32 +01:00
Eisuke Kawashima	5cd27a242f	Fix typo (#2862 ) * Fix typo * Reflect the comments * Fix more typos	2019-12-31 06:43:27 +01:00
Brian Kelley	87555fb29a	Remove O(N) behavior of getNumBonds (#2847 )	2019-12-15 06:44:04 +01:00
Greg Landrum	d8c49e6dab	Code cleanups from PVS/Studio (#2531 ) * first round of cleanups based on PVS-studio suggestions * a couple more * a few more cleanups * another round of cleanups * undo one of those cleanups we want the integer rounding behavior here * add a comment to make that clear * Fix for filter catalog PRECONDITION redundancy	2019-07-13 07:25:37 +02:00
Greg Landrum	5a79190261	rename SGroup -> SubstanceGroup (#2375 ) We leave the names of the bit connected with Mol files as SGroups, since that is appropriate there, but the more generic pieces are renamed	2019-03-30 14:53:24 -04:00
Ric	d26d4b076e	Support for parsing/writing SGroups in SD Mol files. (#2138 ) * Implementation of SGroups * remove sample files test * update gitignore with test outputs * fix RevisionModifier * re-enable tests * backup commit; things seem to work so far * some refactoring; obvious s group tests pass now * more refactoring * everything now out of the public API * not sure why this was still in there * rename functions; all tests now pass * remove getNextFreeSGroupId; readd comment in copy SGroups * clang-format * squash-merge current master * squash merge master * Address comments on PR - Update to current master. - Move SGroup parse time checks to SGroupChecks namespace. - Store SGroups in ROMOl as vector<SGroups>. - SGroup methods return referenes instead of pointers. - Use atom/bond/sgroup indexes for properties instead of pointers. - Have SGroups inherit from RDProps; move properties to RDProps. - Remove trivial/unused methods. - Add a link to the SD specification atop SGroup.h	2019-01-22 15:42:27 +01:00
Dan N	eaa44b40c2	Enhanced stereo read/write support in SDF files. (#2022 ) * add a couple test files * backup * first pass at some theory documentatin * it's a draft * Update enhanced stereochemistry documentation Adds initial target use case and caveats about the tentative nature of the current implementation. * Support read/write of molfile enhanced stereochemistry This includes reading and writing of enhanced stereochemistry from v3000 molfiles (sdf). Enhanced stereochemistry encodes the relative configuration of stereocenters, allowing representation of racemic mixtures and compounds with unknown absolute stereochemistry. It does not include: * Python wrapping * invalidation of the enhanced stereochemistry * use of enhanced stereochemistry in search * depiction of enhanced stereochemistry. * Update to reflect changes from #1971 * change names of enum elements to allow compilation in VS2017 I think it's also clearer to do things this way * Addressed most review comments. * Run missed test "testEnhancedStereoChemistry" * In tests, added size checks to group equality checks * Updated copyright statements * Deleted mol created for a test * Use perfect forwarding in RWMol::setStereoGroups() * use references for stereo groups that are checked in write and pickle * Updated stereogroup.h in hopes of fixing compilation on Windows. * clang-format * try allowing a switch to boost regex and requiring it for g++-4.8 * do a better job of that * typo * Code review comments. Updated Copyright notice. * When an atom is deleted, delete stereo groups containing it. Also updates StereoGroup toUse accessors instead of constant member attributes. This allows move of StereoGroups. * RDKit style guide * Add header required on Windows. * get the SWIG wrappers to build	2018-09-26 15:44:23 +02:00
Greg Landrum	b91daa8ab9	Allow dumping interchange information into SVG files (#2030 ) * add atoms * add bonds * backup * Fixes #2029 * Get metadata working for drawMolecules() * add to python wrapper const correctness * also connected to #2029: make sure bond direction also ends up in the output * initial version of an SVG->ROMol parser this is in the wrong place, but I wanted to make sure it actually works * move svg parser to a more reasonable location there is still some work to be done here * add conformer parsing	2018-09-17 06:49:43 +02:00
Greg Landrum	ba12d98ad0	Removes ATOM/BOND_SPTR in boost::graph in favor of raw pointers (#1713 ) * Removes ATOM/BOND_SPTR in boost::graph in favor of raw pointers * Actually delete atoms and bonds... * RWMol::clear now calls destroy to handle atom/bond deletion * Changes broken Atom lookup for windows/gcc * Adds tests for running with valgrind * Adds test designed for valgrind and molecule deletions * Removes RNG, actually tests bond deletions * update swig wrappers * deal with most recent changes on the main branch	2018-01-07 14:19:47 -05:00
Brian Kelley	0a871bd72e	Dev/modern cxx ranges (#1701 ) Enable range-based for loops for molecules	2018-01-05 06:09:51 +00:00
Greg Landrum	87786c08b5	Merge branch 'master' into modern_cxx # Conflicts: # .travis.yml # Code/GraphMol/FileParsers/MolFileParser.cpp # Code/GraphMol/FileParsers/MolFileStereochem.cpp # Code/GraphMol/ForceFieldHelpers/UFF/testUFFHelpers.cpp # Code/GraphMol/MolAlign/testMolAlign.cpp # Code/GraphMol/MolDraw2D/MolDraw2D.cpp # Code/GraphMol/MolDraw2D/Wrap/rdMolDraw2D.cpp # Code/GraphMol/QueryOps.cpp # Code/GraphMol/ROMol.cpp # Code/GraphMol/SmilesParse/test.cpp # Code/GraphMol/Trajectory/Trajectory.cpp # Code/GraphMol/Wrap/Atom.cpp # Code/GraphMol/Wrap/Bond.cpp # Code/GraphMol/new_canon.cpp # Code/RDGeneral/testDict.cpp # Code/SimDivPickers/Wrap/MaxMinPicker.cpp	2017-10-05 05:58:38 +02:00
Brian Kelley	7488840ac4	Fix/urange check (#1506 ) * Fixes atom documentation * Fixes #1461 This is a complicated one. Basically URANGE_CHECK when used on unsigned integers has a problem when the size of the range it’s checking is 0. The standard operations is to check URANGE(num, size-1) Which (for unsigned integers) obviously rolls over. This fixes all usage cases to be URANGE(num+1, size) And fixes the bugs found. (addBond and the mmff tests) * Fixes #1461 - Updates URANGE_CHECK to be 0<=x<hi	2017-09-11 21:17:33 +02:00
Greg Landrum	915cf08faa	run clang-format with c++-11 style over that	2017-04-22 17:19:10 +02:00
Greg Landrum	7c0bb0b743	clang-tidy output	2017-04-22 17:09:24 +02:00
Brian Kelley	08be8d097e	Removes exponetial numBonds behavior (#1154 ) * Removes exponetial numBonds behavior * Removes accidentally commited Get/SetName	2016-11-12 16:05:06 +01:00
Brian Kelley	ddf7c73b50	Adds Atom atom map and rlabel apis (#1004 ) * Adds Atom atom map and rlabel apis * Moves RLabels to their own namespace, adds other properties. * Removes namespaces, liberally adds Atom to function names. * move detail::computedPropName to RDKit::detail::computedPropName	2016-08-11 04:46:41 +02:00
Brian Kelley	2debdfde0d	Adds RDAny (smaller generic holder) Updates all used dictionaries (#896 ) * Adds RDAny (smaller generic holder) Updates all used dictionaries This is an API compliant version of the current rdany system, but uses a lot less memory in practice. * Removes code duplication * Converts CHECK_INVARIANT to TEST_ASSERT * Fixes DoubleTag issue * Adds Bool to DoubleMagic implementation * Removes reference to property pickler	2016-05-29 17:04:21 +01:00
Paolo Tosco	2b3a818f84	- removed the dependency on Trajectory from ROMol and ForceField	2016-05-11 19:37:09 +01:00
Paolo Tosco	a2eca41365	- the Trajectory object now holds a vector of Snapshots rather than a vector of shared_ptr to Snapshots - The PySnapshot class was removed - the Trajectory::readAmber and Trajectory::readGromos member functions were converted into non-member functions - tests were modified accordingly	2016-05-04 18:42:23 +01:00
Paolo Tosco	d16b312ee6	- Completely revised coordinate ownership - Implemented Python wrappers - prepared relevant test cases	2016-04-24 23:30:25 +01:00
Paolo Tosco	a5de000c5c	- fixed int/unsigned int	2016-04-16 20:41:40 +01:00
Paolo Tosco	b35538599f	- Added AMBER trajectory reader and relevant tests	2016-04-16 20:28:45 +01:00
Paolo Tosco	9d5e56fcd2	- added a test for testAddConformersFromTrajectory() - added some documentation	2016-04-14 23:45:24 +01:00
Paolo Tosco	584b77ea18	- work in progress on the Trajectory branch	2016-04-14 20:04:58 +01:00
kelley	5dbec2fe85	Adds rdcasts where appropriate	2015-11-29 17:52:27 -05:00
Greg Landrum	e08e0d16d8	first pass, using google style	2015-11-14 14:58:11 +01:00
Brian Kelley	fb84c9f0b7	Switches to URANGE_CHECK when appropriate	2015-10-18 21:14:02 -04:00
Brian Kelley	daa7e62258	Fixes signed conversion issues (use rdcast)	2015-10-18 15:16:38 -04:00
Greg Landrum	b78bb40ca5	Fixes #384	2015-03-30 07:20:24 +02:00
Brian Kelley	95a92282d1	Dictionary access is saniztized and optimized. o rdkit gains a RDKit::common_properties namespace that contains common string value properties o Dict.h and below gain getPropIfPresent that attempts to retrieve a property and returns true/false on success or failure. This is used to optimize access. o rdkit learns how to pass property keys by reference, not value. A new namespace has been added to RDKit, common_properties that contains the std::string values for commonly used properties. This helps to avoid typos in string values but also avoids a creation of std::strings from character values. All accessors (has/get/clear and getPropIfPresent) now pass the key by reference. Additionally, getPropIfPresent removes the double lookup of hasProp/getProp which can be a significant speedup in the smiles and smarts parsers (10-20%)	2015-01-15 12:23:29 -05:00

1 2

78 Commits