* Issue #2403: Speed up SSSR symmetrization
For my horrible example molecule (a highly symmetric
nanotube with 2400 atoms and > 1000 rings), this speeds up
symmetrizeSSSR() from 5s to about 0.002s. findSSSR() takes
another .4s or so.
* Refactor after Ricardo's suggestions
* Greg's review comments. use std::vector
This removes some redundancy from some of the test code in order to bring
the runtime down. This does not affect test coverage and shouldn't do
anything bad to the overall test quality.
* Potential implementation of copying enhanced stereo groups
Copies the enhanced stereo if all atoms in the reactant
end up in the same molecule of the product with valid
ChiralTags.
Current implementation: Only copy StereoGroup if all atoms are "valid" in the product.
Possible implementation: Copy StereoGroup for all atoms that are "valid" in the product.
Details:
Uses ChiralTag invalidation to decide whether StereoGroup should be copied. If
the product atoms have valid ChiralTag, then the reaction was able to
meaningfully propogate chirality from the reactant to the product. This means
that it is also meaningful to propogate the StereoGroup from the reactant to
the product.
The only exception to this is if the product template defines a specific
absolute configuration for an atom. This means that the reaction defines the
stereochemistry for the atom, so the stereochemistry of that atom is no longer
relative.
If an atom from a reactant StereoGroup appears multiple times in the product,
all copies of that atom are put in the same product StereoGroup.
Still developing test cases.
from rdkit import Chem
from rdkit.Chem import AllChem
# Duplicate a molecule example:
mol1 = Chem.MolFromSmiles('Cl[C@@H](Br)C[C@H](Br)CCO |&1:1,4|')
mol2 = Chem.MolFromSmiles('CC(=O)C')
rxn = AllChem.ReactionFromSmarts('[O:1].[C:2]=O>>[O:1][C:2][O:1]')
for prods in rxn.RunReactants([mol1, mol2]):
for p in prods:
for a in p.GetAtoms():
for k in a.GetPropsAsDict():
a.ClearProp(k)
print(Chem.MolToCXSmiles(p))
Output:
[21:26:08] product atom-mapping number 1 found multiple times.
CC(C)(OCC[C@@H](Br)C[C@@H](Cl)Br)OCC[C@@H](Br)C[C@@H](Cl)Br |&1:6,9,15,18
* Issue 2366: Documentation and fix stereo group invalidation
Adds some documentation to EnhancedStereo.md
Also invalidates StereoGroup if a reaction specifies the
stereochemistry of a center. This destroys the relative
relationship of the center to other centers.
* Demo python file examples for Enhanced Stereochemistry in reactions
This is not intended to be pushed. These probably will become test
cases. For the output looks like this:
0a. Reaction preserves stereo:
[C@:1]>>[C@:1]
F[C@H](Cl)Br |o1:1|
>>
F[C@H](Cl)Br |o1:1|
0b. Reaction preserves stereo:
[C@:1]>>[C@:1]
F[C@@H](Cl)Br |&1:1|
>>
F[C@@H](Cl)Br |&1:1|
0c. Reaction preserves stereo:
[C@:1]>>[C@:1]
FC(Cl)Br
>>
FC(Cl)Br
1a. Reaction ignores stereo:
[C:1]>>[C:1]
F[C@H](Cl)Br |a:1|
>>
F[C@H](Cl)Br |a:1|
1b. Reaction ignores stereo:
[C:1]>>[C:1]
F[C@@H](Cl)Br |&1:1|
>>
F[C@@H](Cl)Br |&1:1|
1c. Reaction ignores stereo:
[C:1]>>[C:1]
FC(Cl)Br
>>
FC(Cl)Br
2a. Reaction inverts stereo:
[C@:1]>>[C@@:1]
F[C@H](Cl)Br |o1:1|
>>
F[C@@H](Cl)Br |o1:1|
2b. Reaction inverts stereo:
[C@:1]>>[C@@:1]
F[C@@H](Cl)Br |&1:1|
>>
F[C@H](Cl)Br |&1:1|
2c. Reaction inverts stereo:
[C@:1]>>[C@@:1]
FC(Cl)Br
>>
FC(Cl)Br
3a. Reaction destroys stereo:
[C@:1]>>[C:1]
F[C@H](Cl)Br |o1:1|
>>
FC(Cl)Br
3b. Reaction destroys stereo:
[C@:1]>>[C:1]
F[C@@H](Cl)Br |&1:1|
>>
FC(Cl)Br
3c. Reaction destroys stereo:
[C@:1]>>[C:1]
FC(Cl)Br
>>
FC(Cl)Br
3d. Reaction destroys stereo (but preserves unaffected group):
[C@:1]F>>[C:1]F
F[C@H](Cl)[C@@H](Cl)Br |o1:1,&2:3|
>>
FC(Cl)[C@@H](Cl)Br |&1:3|
3e. Reaction destroys stereo:
[C@:1]F>>[C:1]F
F[C@H](Cl)[C@@H](Cl)Br |&1:1,3|
>>
FC(Cl)[C@@H](Cl)Br
4a. Reaction creates stereo:
[C:1]>>[C@@:1]
F[C@H](Cl)Br |o1:1|
>>
F[C@@H](Cl)Br
4b. Reaction creates stereo:
[C:1]>>[C@@:1]
F[C@@H](Cl)Br |&1:1|
>>
F[C@@H](Cl)Br
4c. Reaction creates stereo:
[C:1]>>[C@@:1]
FC(Cl)Br
>>
F[C@@H](Cl)Br
4d. Reaction creates stereo (preserve unaffected group):
[C:1]F>>[C@@:1]F
F[C@H](Cl)[C@@H](Cl)Br |o1:1,&2:3|
>>
F[C@@H](Cl)[C@@H](Cl)Br |&1:3|
4e. Reaction creates stereo:
[C:1]F>>[C@@:1]F
F[C@H](Cl)[C@@H](Cl)Br |o1:1,3|
>>
F[C@@H](Cl)[C@@H](Cl)Br
5a. Reaction preserves unrelated stereo:
[C@:1]F>>[C@:1]F
F[C@H](Cl)[C@@H](Cl)Br |o1:3|
>>
F[C@H](Cl)[C@@H](Cl)Br |o1:3|
5b. Reaction ignores unrelated stereo:
[C:1]F>>[C:1]F
F[C@H](Cl)[C@@H](Cl)Br |o1:3|
>>
F[C@H](Cl)[C@@H](Cl)Br |o1:3|
5c. Reaction inverts unrelated stereo:
[C@:1]F>>[C@@:1]F
F[C@H](Cl)[C@@H](Cl)Br |o1:3|
>>
F[C@@H](Cl)[C@@H](Cl)Br |o1:3|
5d. Reaction destroys unrelated stereo:
[C@:1]F>>[C:1]F
F[C@H](Cl)[C@@H](Cl)Br |o1:3|
>>
FC(Cl)[C@@H](Cl)Br |o1:3|
5e. Reaction creates unrelated stereo:
[C:1]F>>[C@@:1]F
F[C@H](Cl)[C@@H](Cl)Br |o1:3|
>>
F[C@@H](Cl)[C@@H](Cl)Br |o1:3|
6e. Reaction splits StereoGroup atoms into two Mols:
[C:1]OO[C:2]>>[C:2]O.O[C:1]
F[C@H](Cl)OO[C@@H](Cl)Br |o1:1,5|
>>
O[C@@H](Cl)Br + O[C@H](F)Cl
>>
O[C@H](F)Cl + O[C@@H](Cl)Br
7. Add two copies:
[O:1].[C:2]=O>>[O:1][C:2][O:1]
Cl[C@@H](Br)C[C@H](Br)CCO |&1:1,4| + CC(=O)C
[17:15:38] product atom-mapping number 1 found multiple times.
>>
CC(C)(OCC[C@@H](Br)C[C@@H](Cl)Br)OCC[C@@H](Br)C[C@@H](Cl)Br |&1:6,9,15,18|
8. Add two copies:
[O:1].[C:2]=O>>[O:1][C:2][O:1]
Cl[C@@H](Br)C[C@H](Br)CCO |&1:1,4| + CC(=O)C
[17:15:38] product atom-mapping number 1 found multiple times.
>>
CC(C)(OCC[C@@H](Br)C[C@@H](Cl)Br)OCC[C@@H](Br)C[C@@H](Cl)Br |&1:6,9,15,18|
* Updates StereoGroup strategy in reactions to copy all possible atoms.
Copy all atoms for which the stereochemistry was not created or destroyed
in the reaction. Any StereoGroup which has at least one atom will appear
in the product.
Also updates the documentation to match this description, and adds C++
and Python tests which fail before this PR and pass after. The Python
tests are more extensive.
Test output was validated by hand (especially the stereo groups
generated. I'm less confident in the reaction processing in my head,
but I truested the existing validation there.)
For future diagnosis: Python unittest failures will look like:
AssertionError: 'F[C@H](Cl)Br' != 'F[C@H](Cl)Br |&1:1|'
- F[C@H](Cl)Br
+ F[C@H](Cl)Br |&1:1|
? +++++++
For future diagnosis: C++ Catch2 failures will look like:
CHECK( MolToCXSmiles(*p) == "F[C@H](Cl)Br |o1:1|" )
with expansion:
"FC(Cl)[C@@H](Cl)Br |&1:3|"
==
"F[C@H](Cl)Br |o1:1|"
* Add a couple of new tests.
* rename "relative" to "enhanced"
some reformatting
* Factor out test helper function.
* Actually, enhanced stereo groups are exposed ot Python
* Added discussion of enhanced stereochemistry in reactions to docs
* Fix new test
* Fixes#2311
at least I hope it does
* Stop using deprecated boost functionality
* allow the Murtagh module to import even if the code isn't built
update the associated tests
* update release notes
* typo
* fix integer division
* Fixes#2383 (tests coming in the next commit)
Minor typo fix
Fixes a "bug" in one of the default transforms
* Adds support for directly providing normalization parameter data
instead of requiring the use of a text file.
* allow fragment removers to be initialized with string data
* remove unicode
* allow the reionizer to be initialized from a stream
* modify the uncharger to be use a canonical atom ordering
* add doCanonical cleanup parameter
make canonical ordering the default
document the change
* Add neutralization of additonal negative groups (not just acids).
This may not be the right thing to do.
* expose the new parameter to python
* changes in response to review
* Move DetectAtomStereoChemistry to Molops::assignChiralTypesFromBondDirs
DetectAtomStereoChemistry in MolFileStereochem is more broadly
useful. Additionally, it was not named very clearly for what
it was actually doing.
* Wraps assignChiralTypesFromBondDirs for use in Python
Makes assignChiralTypesFromBondDirs available in Python
and adds a test demonstrating that availability.
* Correct typo in thiophene_E pattern, !H0,!H1 is "always true" should be !H0!H1
* Errors in ring closure translation from original SLN.
* Make queries agnostic to aromaticity model.
* Redundant recursive SMARTS
* More queries that benefit from optional aromaticity.
* Update the (.in) files from previous commit.
* Update thiaz_ene_A inline with CSV file.
* Move RDBoostStreams to RDStreams
* RDBoostStreams->RDStreams
* RDBoostStreams->RDStreams
* Wrap SWIG (with Java test)
* Fix missing declaration
* Use the file that already exists
* Revert to original version
* Revert to CXSMiles version
* Update boost version
* Remove redundant code
* Add zlib
* check for win32
* FileParsers now builds static on windows
* added a set of test files for SGroups.
Many thanks to Gerd Blanke for providing these
* Partial version of the wrapper
Definitely needs more work
* add some properties
* basic SGroup property change test
* not working; backup commit
* disable writing for now
* add ClearMolSGroups() function
* review response: add a couple missing methods
* remove spaces from filenames
* update filename in test
* changes in response to review
* add operator== to SGroups
* solve lifetime problems with a vector_indexing_suite
* add SKIP_IF_ALL_MATCH argument to FragmentRemover
Refactor FragmentRemover::remove() to make it more efficient
* implement and test SKIP_IF_ALL_MATCH
* expose the extra option to Python
* add info to logger
* unused vars in bison parser cleanup
* initialization order in TopologicalTorsionGenerator
* unused params in SLN bison
* sln flex unused params
* throwing destructor in TDTWriter
* signed comparison in substructmethods
* unused input param in smiles/smarts bison
* unused ms param in sln bison
* signed comparison in FingerprintGenerator
* store return of fscanf in StructCheckerOptions
* unreferenced var in catch
* uninitialized value in FileParserUtils
* avoid override overload warning in MolDraw2DSVG
* non-final overrides in Validate.h
* unused static var in Avalon
* unused vars in catch blocks
* make AvalonTools avalonSimilarityBits & avalonSSSBits const int
* assert fscanf result in StructCheckerOptions
* Allow Atoms to be copyied in Python.
The dunder copy method is the idiomatic way to support
making copies in Python. Includes a test to make sure that
copied atoms are usable.
* Use RWMol in Code/GraphMol/Wrap/rough_test.py
Co-Authored-By: d-b-w <dan.nealschneider@schrodinger.com>
* Allow access to an atom's copy constructor in Python
* Add writing of enhanced stereo to cxsmiles
* changes in response to review
* fix an interaction between the fragment and enhanced smiles bits
* fix a logic error
* remove all of the "from __future__" imports
* remove the first batch of rdkit.six imports/uses
* next step of rdkit.six removal
* removing xrange, range, and some maps
* next round of removals
* next round of cleanups
* fix inchi test
* last bits of "from rdkit.six" are gone
* and the last of the six stuff is gone
* strange importlib problem
* first crude pass
* fix a deprecation
* change naming scheme, support bools
* add standalone function
* add a default value for missings
* support long lines
* stupid typo
* make operator[] work
* revisit missing value handling
* modify missing value handling
* switch to an alternate scheme for specifying missing values
* clang-format
* First pass at property list parser
still needs more tests
* add test for processMolPropertyLists
* get this working as part of the ForwardSDMolSupplier
* first pass at python wrappers and tests
* clang-format run
* add creation of property lists at the mol level
* wrap long lines on output
* remove PoC implementation
* fix python wrappers
* remove out-of-date reference to the Python PoC
* changes in response to review
* WIP - Substruct Library Serialization
* Add serialization to SubstructLibrary
* Add SubstructLibraryDefs
This holds the definition of whether the substruct
library is serializable
* Wrap serialization in python
* Remove .h file, add .h.in file
* Configure header file into source dir
* Use RDConfig.h to configure serialization
* Move serialization code to seperate header file
* Fixes for review comments
* Removes some code redundancy
* Make pickling mols less memory intensive
* Check if molholders come back as the right types
* very basics of writing as PoC: atom labels end up in output
* further progress, does not currently work
* stop writing coordinate bonds;
we represent these in SMILES already
* add coords
* fix typo in dealing with atom properties
* get atom props working
* changes in response to PR
* Fixes#2277
* changes in response to review
the big one is to move the PXA parser into the normal mol file parsing
* move the PXA changes to the writer as well
* SCN actually only needs 7 characters
* add test
* fixes in response to review
* handle blanks (instead of zeros) in the counts line.
The ctfile.pdf doc says we should do this
* Make the SGroup reader more robust w.r.t. bad data
The current behavior leads to uncaught exceptions when a line is too short.
This should clear that up so that we always throw the usual FileParseException
* make error messages a bit easier to read
* not quite done yet
* Fixes#2244
* Fixes#2148
This fixes a few of the knock-on effects of the actual fix.
* Test that we still write SMILES properly
* Fixes#2266 (#2269)
* Fixes#2268 (#2270)
* Improve interactivity of output SVG (#2253)
* Add clickable atoms when tagAtoms() is called
* add python tests
* add class tags for atoms and bonds
* add marker to allow easy insertion of extra text