Files
rdkit/Docs/Code/EnhancedStereo.md
Dan N 2bcb7ea692 I2366 Preserve enhanced stereo in reactions (#2377)
* Potential implementation of copying enhanced stereo groups

Copies the enhanced stereo if all atoms in the reactant
end up in the same molecule of the product with valid
ChiralTags.

Current implementation: Only copy StereoGroup if all atoms are "valid" in the product.
Possible implementation: Copy StereoGroup for all atoms that are "valid" in the product.

Details:
Uses ChiralTag invalidation to decide whether StereoGroup should be copied. If
the product atoms have valid ChiralTag, then the reaction was able to
meaningfully propogate chirality from the reactant to the product. This means
that it is also meaningful to propogate the StereoGroup from the reactant to
the product.

The only exception to this is if the product template defines a specific
absolute configuration for an atom. This means that the reaction defines the
stereochemistry for the atom, so the stereochemistry of that atom is no longer
relative.

If an atom from a reactant StereoGroup appears multiple times in the product,
all copies of that atom are put in the same product StereoGroup.

Still developing test cases.

    from rdkit import Chem
    from rdkit.Chem import AllChem

    # Duplicate a molecule example:
    mol1 = Chem.MolFromSmiles('Cl[C@@H](Br)C[C@H](Br)CCO |&1:1,4|')
    mol2 = Chem.MolFromSmiles('CC(=O)C')
    rxn = AllChem.ReactionFromSmarts('[O:1].[C:2]=O>>[O:1][C:2][O:1]')
    for prods in rxn.RunReactants([mol1, mol2]):
        for p in prods:
            for a in p.GetAtoms():
                for k in a.GetPropsAsDict():
                    a.ClearProp(k)
            print(Chem.MolToCXSmiles(p))

Output:

[21:26:08] product atom-mapping number 1 found multiple times.
CC(C)(OCC[C@@H](Br)C[C@@H](Cl)Br)OCC[C@@H](Br)C[C@@H](Cl)Br |&1:6,9,15,18

* Issue 2366: Documentation and fix stereo group invalidation

Adds some documentation to EnhancedStereo.md

Also invalidates StereoGroup if a reaction specifies the
stereochemistry of a center. This destroys the relative
relationship of the center to other centers.

* Demo python file examples for Enhanced Stereochemistry in reactions

This is not intended to be pushed. These probably will become test
cases. For the output looks like this:

    0a. Reaction preserves stereo:
      [C@:1]>>[C@:1]
        F[C@H](Cl)Br |o1:1|
          >>
          F[C@H](Cl)Br |o1:1|

    0b. Reaction preserves stereo:
      [C@:1]>>[C@:1]
        F[C@@H](Cl)Br |&1:1|
          >>
          F[C@@H](Cl)Br |&1:1|

    0c. Reaction preserves stereo:
      [C@:1]>>[C@:1]
        FC(Cl)Br
          >>
          FC(Cl)Br

    1a. Reaction ignores stereo:
      [C:1]>>[C:1]
        F[C@H](Cl)Br |a:1|
          >>
          F[C@H](Cl)Br |a:1|

    1b. Reaction ignores stereo:
      [C:1]>>[C:1]
        F[C@@H](Cl)Br |&1:1|
          >>
          F[C@@H](Cl)Br |&1:1|

    1c. Reaction ignores stereo:
      [C:1]>>[C:1]
        FC(Cl)Br
          >>
          FC(Cl)Br

    2a. Reaction inverts stereo:
      [C@:1]>>[C@@:1]
        F[C@H](Cl)Br |o1:1|
          >>
          F[C@@H](Cl)Br |o1:1|

    2b. Reaction inverts stereo:
      [C@:1]>>[C@@:1]
        F[C@@H](Cl)Br |&1:1|
          >>
          F[C@H](Cl)Br |&1:1|

    2c. Reaction inverts stereo:
      [C@:1]>>[C@@:1]
        FC(Cl)Br
          >>
          FC(Cl)Br

    3a. Reaction destroys stereo:
      [C@:1]>>[C:1]
        F[C@H](Cl)Br |o1:1|
          >>
          FC(Cl)Br

    3b. Reaction destroys stereo:
      [C@:1]>>[C:1]
        F[C@@H](Cl)Br |&1:1|
          >>
          FC(Cl)Br

    3c. Reaction destroys stereo:
      [C@:1]>>[C:1]
        FC(Cl)Br
          >>
          FC(Cl)Br

    3d. Reaction destroys stereo (but preserves unaffected group):
      [C@:1]F>>[C:1]F
        F[C@H](Cl)[C@@H](Cl)Br |o1:1,&2:3|
          >>
          FC(Cl)[C@@H](Cl)Br |&1:3|

    3e. Reaction destroys stereo:
      [C@:1]F>>[C:1]F
        F[C@H](Cl)[C@@H](Cl)Br |&1:1,3|
          >>
          FC(Cl)[C@@H](Cl)Br

    4a. Reaction creates stereo:
      [C:1]>>[C@@:1]
        F[C@H](Cl)Br |o1:1|
          >>
          F[C@@H](Cl)Br

    4b. Reaction creates stereo:
      [C:1]>>[C@@:1]
        F[C@@H](Cl)Br |&1:1|
          >>
          F[C@@H](Cl)Br

    4c. Reaction creates stereo:
      [C:1]>>[C@@:1]
        FC(Cl)Br
          >>
          F[C@@H](Cl)Br

    4d. Reaction creates stereo (preserve unaffected group):
      [C:1]F>>[C@@:1]F
        F[C@H](Cl)[C@@H](Cl)Br |o1:1,&2:3|
          >>
          F[C@@H](Cl)[C@@H](Cl)Br |&1:3|

    4e. Reaction creates stereo:
      [C:1]F>>[C@@:1]F
        F[C@H](Cl)[C@@H](Cl)Br |o1:1,3|
          >>
          F[C@@H](Cl)[C@@H](Cl)Br

    5a. Reaction preserves unrelated stereo:
      [C@:1]F>>[C@:1]F
        F[C@H](Cl)[C@@H](Cl)Br |o1:3|
          >>
          F[C@H](Cl)[C@@H](Cl)Br |o1:3|

    5b. Reaction ignores unrelated stereo:
      [C:1]F>>[C:1]F
        F[C@H](Cl)[C@@H](Cl)Br |o1:3|
          >>
          F[C@H](Cl)[C@@H](Cl)Br |o1:3|

    5c. Reaction inverts unrelated stereo:
      [C@:1]F>>[C@@:1]F
        F[C@H](Cl)[C@@H](Cl)Br |o1:3|
          >>
          F[C@@H](Cl)[C@@H](Cl)Br |o1:3|

    5d. Reaction destroys unrelated stereo:
      [C@:1]F>>[C:1]F
        F[C@H](Cl)[C@@H](Cl)Br |o1:3|
          >>
          FC(Cl)[C@@H](Cl)Br |o1:3|

    5e. Reaction creates unrelated stereo:
      [C:1]F>>[C@@:1]F
        F[C@H](Cl)[C@@H](Cl)Br |o1:3|
          >>
          F[C@@H](Cl)[C@@H](Cl)Br |o1:3|

    6e. Reaction splits StereoGroup atoms into two Mols:
      [C:1]OO[C:2]>>[C:2]O.O[C:1]
        F[C@H](Cl)OO[C@@H](Cl)Br |o1:1,5|
          >>
          O[C@@H](Cl)Br + O[C@H](F)Cl
          >>
          O[C@H](F)Cl + O[C@@H](Cl)Br

    7. Add two copies:
      [O:1].[C:2]=O>>[O:1][C:2][O:1]
        Cl[C@@H](Br)C[C@H](Br)CCO |&1:1,4| + CC(=O)C
    [17:15:38] product atom-mapping number 1 found multiple times.
          >>
          CC(C)(OCC[C@@H](Br)C[C@@H](Cl)Br)OCC[C@@H](Br)C[C@@H](Cl)Br |&1:6,9,15,18|

    8. Add two copies:
      [O:1].[C:2]=O>>[O:1][C:2][O:1]
        Cl[C@@H](Br)C[C@H](Br)CCO |&1:1,4| + CC(=O)C
    [17:15:38] product atom-mapping number 1 found multiple times.
          >>
          CC(C)(OCC[C@@H](Br)C[C@@H](Cl)Br)OCC[C@@H](Br)C[C@@H](Cl)Br |&1:6,9,15,18|

* Updates StereoGroup strategy in reactions to copy all possible atoms.

Copy all atoms for which the stereochemistry was not created or destroyed
in the reaction. Any StereoGroup which has at least one atom will appear
in the product.

Also updates the documentation to match this description, and adds C++
and Python tests which fail before this PR and pass after. The Python
tests are more extensive.

Test output was validated by hand (especially the stereo groups
generated. I'm less confident in the reaction processing in my head,
but I truested the existing validation there.)

For future diagnosis: Python unittest failures will look like:

    AssertionError: 'F[C@H](Cl)Br' != 'F[C@H](Cl)Br |&1:1|'
    - F[C@H](Cl)Br
    + F[C@H](Cl)Br |&1:1|
    ?             +++++++

For future diagnosis: C++ Catch2 failures will look like:

      CHECK( MolToCXSmiles(*p) == "F[C@H](Cl)Br |o1:1|" )
    with expansion:
      "FC(Cl)[C@@H](Cl)Br |&1:3|"
      ==
      "F[C@H](Cl)Br |o1:1|"

* Add a couple of new tests.

* rename "relative" to "enhanced"
some reformatting

* Factor out test helper function.

* Actually, enhanced stereo groups are exposed ot Python

* Added discussion of enhanced stereochemistry in reactions to docs

* Fix new test
2019-04-07 06:06:28 +02:00

2.8 KiB

Enhanced Stereochemistry in the RDKit

Greg Landrum (greg.landrum@t5informatics.com)

September 2018

This is still a DRAFT

Overview

We are going to follow, at least for the initial implementation, the enhanced stereo representation used in V3k mol files: groups of atoms with specified stereochemistry with an ABS, AND, or OR marker indicating what is known. The general idea is that AND indicates mixtures and OR indicates unknown single substances.

Here are some illustrations of what the various combinations mean:

What's drawn Mixture? What it means
img1a mixture img1b
img2a mixture img2b
img3a mixture img3b
img4a single img4b
img5a single img5b
img5a single img5b
img6a mixture img6b
img7a single img7b

Use cases

The initial target is to not lose data on an V3k mol -> RDKit -> V3k mol round trip. Manipulation, depiction, and searching is a secondary goal.

Representation

Stored as a vector of StereoGroup objects.

A StereoGroup contains an enum with the type as well as pointers to the atoms involved. We will need to adjust this when atoms are removed or replaced.

Enumeration

The Python-only function rdkit.Chem.EnumerateStereoisomers.EnumerateStereoisomers can be used to enumerate all stereoisomers represented by the StereoGroups. It is possible to enumerate only the Stereo Groups, ignoring other unspecified centers.

Reactions

The reaction runner is aware of stereo groups associated with a Mol. Atoms in a Stereo Group are copied to all products in which they appear, unless the reaction created or destroyed stereochemistry at that atom. If an atom from a reactant StereoGroup appears multiple times in the product, all copies of that atom are put in the same product StereoGroup.

Searching

This will be handled by searching in MolBundle objects produced by the enumeration code. Substructure searching and canonicalization do not yet support stereo groups.

Depiction

Something needs to be added to the depiction code to allow the groups to be seen.