mirror of
https://github.com/rdkit/rdkit.git
synced 2026-06-06 22:39:55 +08:00
* Speed up tautomer canonicalization by deferring on SSSR calc * Lazy kekulization for tautomer enumeration Defer kekulization of tautomers until they are actually needed for transform matching. This avoids creating kekulized copies for: 1. The initial tautomer (until first iteration) 2. New tautomers that may never be processed (if enumeration ends early) The Tautomer class now supports lazy initialization of the kekulized form via getKekulized() method. Performance improvement: ~7% additional speedup (total ~22-24% from baseline) * Use count-only substructure matching in tautomer scoring * Add SubstructMatchCount regression test * MolStandardize: reduce enumerate overhead * MolStandardize: avoid per-tautomer ring recomputation * Atom: cache PeriodicTable pointer in valence calcs * Atom: reuse PeriodicTable in getEffectiveAtomicNum * PeriodicTable: add atomic fast path for getTable * GraphMol: reduce ROMol copy reallocations * MolStandardize: use quickCopy for per-match product copies Use RWMol(*kmol, true) in tautomer enumeration to avoid copying properties/bookmarks/conformers for each candidate. This reduces deep-copy overhead without changing chemistry. * MolStandardize: pre-filter scoring patterns by element/connectivity For tautomer scoring, pre-compute which SubstructTerms are relevant for a given input molecule. Since tautomerization only moves H atoms and changes bond orders (never creates/destroys heavy-atom bonds), patterns requiring missing elements or connectivity can be skipped for all tautomers of that molecule. Two-stage filtering: 1. Element check: skip patterns requiring atoms not in the molecule 2. Connectivity check: skip patterns whose bond-order-agnostic structure doesn't match the input molecule's connectivity This reduces the number of VF2 substructure calls per tautomer from 12 to typically 3-5, depending on the molecule's composition. * MolStandardize: preserve molecule properties for canonical tautomer Copy molecule properties from the original input to the canonical tautomer result. Since quickCopy during enumeration skips d_props to avoid overhead, extended SMILES data like link nodes (LN) was lost. This restores them on the final result. * TautomerQuery: preserve molecule properties (e.g. link nodes) in tautomers TautomerQuery::fromMol() uses TautomerEnumerator::enumerate() which uses quickCopy for performance. This doesn't copy molecule properties like _molLinkNodes. Without this fix, XQMol output would lose link node extensions in the SMILES. Copy properties from the original query molecule to all enumerated tautomers before constructing the TautomerQuery. This preserves extended SMILES data without impacting enumeration performance. * MolStandardize: use parallel iteration and cache bond lookups Replace O(n) getAtomWithIdx/getBondWithIdx calls with parallel iteration over atom/bond ranges in canonicalizeInPlace and enumerate. Cache bond lookups in setTautomerStereoAndIsoHs to avoid repeated O(n) searches. * perf: add specialized matchers for simple tautomer scoring patterns Replace VF2 graph matching with O(n) loops for 6 simple patterns: - countDoubleOrAromaticBonds: C=O, N=O, P=O patterns - countMethyls: [CX4H3] methyl groups - countCarbonDoubleHetero: [C]=[/home/dcvuser/rdkit;Code/GraphMol/MolStandardize/Tautomer.h] aliphatic C=hetero - countAromaticCarbonExocyclicN: [c]=aromatic C=exocyclic N Complex patterns (benzoquinone, oxim, guanidine, aci-nitro) still use VF2. Combined with the pre-filtering optimization, this achieves ~3.7x speedup (~2500ms vs ~9300ms original) for tautomer canonicalization. * Fix tautomer canonicalize dropping conformers from quickCopy quickCopy (RWMol(*mol, true)) skips conformers, so tautomer enumeration products lose 2D/3D coordinates. This causes InChI generation to omit the /b (double bond E/Z stereo) layer, since E/Z is derived from atomic coordinates. Fix: copy conformers from the original molecule onto the canonical tautomer after pickCanonical in TautomerEnumerator::canonicalize(). Tests: SMILES-based E/Z check in testTautomer.cpp, molblock-based conformer preservation check in catch_tests.cpp. * add test on canonicalize losing stereo * add regression test for exocyclic C=C tautomer canonicalization The getTautomerStateKey() pre-filter (commit 2595ef748) can falsely deduplicate distinct tautomers when their atom-index-ordered state patterns happen to match, leading canonicalize() to pick the wrong canonical form for molecules with STEREOTRANS-pinned exocyclic C=C bonds after RemoveHs. Test verifies that O=C(CC1=CC2=CC=COC2)NC1=O canonicalizes to the exocyclic form O=C1CC(=CC2=CC=COC2)C(=O)N1, not the endocyclic form O=C1C=C(C=C2CC=COC2)C(=O)N1. Currently expected to FAIL until the state key dedup bug is fixed. * MolStandardize: expand tautomer connectivity SMARTS * MolStandardize: scope tautomer pattern enum * MolStandardize: trim tautomer pattern enum * MolStandardize: use symmetric ring scoring
2024 lines
79 KiB
Python
2024 lines
79 KiB
Python
#
|
|
# Copyright (C) 2018-2025 Susan H. Leung and other RDKit contributors
|
|
# All Rights Reserved
|
|
#
|
|
import math
|
|
import os
|
|
import sys
|
|
import unittest
|
|
from datetime import datetime, timedelta
|
|
|
|
from rdkit import Chem, DataStructs, RDConfig
|
|
from rdkit.Chem.MolStandardize import rdMolStandardize
|
|
from rdkit.Chem import inchi, rdCIPLabeler
|
|
from rdkit.Chem.rdchem import Atom
|
|
from rdkit.Geometry import rdGeometry as geom
|
|
|
|
|
|
class TestCase(unittest.TestCase):
|
|
|
|
def setUp(self):
|
|
pass
|
|
|
|
def test1Cleanup(self):
|
|
mol = Chem.MolFromSmiles("CCC(=O)O[Na]")
|
|
nmol = rdMolStandardize.Cleanup(mol)
|
|
self.assertEqual(Chem.MolToSmiles(nmol), "CCC(=O)[O-].[Na+]")
|
|
|
|
def test2StandardizeSmiles(self):
|
|
self.assertEqual(rdMolStandardize.StandardizeSmiles("CCC(=O)O[Na]"), "CCC(=O)[O-].[Na+]")
|
|
|
|
def test3Parents(self):
|
|
mol = Chem.MolFromSmiles("[Na]OC(=O)c1ccccc1")
|
|
nmol = rdMolStandardize.FragmentParent(mol)
|
|
self.assertEqual(Chem.MolToSmiles(nmol), "O=C([O-])c1ccccc1")
|
|
|
|
mol = Chem.MolFromSmiles("C[NH+](C)(C).[Cl-]")
|
|
nmol = rdMolStandardize.ChargeParent(mol)
|
|
self.assertEqual(Chem.MolToSmiles(nmol), "CN(C)C")
|
|
|
|
mol = Chem.MolFromSmiles("[O-]CCCC=CO.[Na+]")
|
|
nmol = rdMolStandardize.TautomerParent(mol)
|
|
self.assertEqual(Chem.MolToSmiles(nmol), "O=CCCCC[O-].[Na+]")
|
|
nmol = rdMolStandardize.TautomerParent(mol, skipStandardize=True)
|
|
# same answer because of the standardization at the end
|
|
self.assertEqual(Chem.MolToSmiles(nmol), "O=CCCCC[O-].[Na+]")
|
|
|
|
mol = Chem.MolFromSmiles("C[C@](F)(Cl)C/C=C/[C@H](F)Cl")
|
|
nmol = rdMolStandardize.StereoParent(mol)
|
|
self.assertEqual(Chem.MolToSmiles(nmol), "CC(F)(Cl)CC=CC(F)Cl")
|
|
|
|
mol = Chem.MolFromSmiles("[12CH3][13CH3]")
|
|
nmol = rdMolStandardize.IsotopeParent(mol)
|
|
self.assertEqual(Chem.MolToSmiles(nmol), "CC")
|
|
|
|
mol = Chem.MolFromSmiles("[Na]Oc1c([12C@H](F)Cl)c(O[2H])c(C(=O)O)cc1CC=CO")
|
|
nmol = rdMolStandardize.SuperParent(mol)
|
|
self.assertEqual(Chem.MolToSmiles(nmol), "O=CCCc1cc(C(=O)O)c(O)c(C(F)Cl)c1O")
|
|
mol = Chem.MolFromSmiles("[Na]Oc1c([12C@H](F)Cl)c(O[2H])c(C(=O)O)cc1CC=CO")
|
|
nmol = rdMolStandardize.SuperParent(mol, skipStandardize=True)
|
|
self.assertEqual(Chem.MolToSmiles(nmol), "O=CCCc1cc(C(=O)[O-])c(O)c(C(F)Cl)c1O.[Na+]")
|
|
|
|
def test4Normalize(self):
|
|
mol = Chem.MolFromSmiles(r"C[N+](C)=C\C=C\[O-]")
|
|
nmol = rdMolStandardize.Normalize(mol)
|
|
self.assertEqual(Chem.MolToSmiles(nmol), "CN(C)C=CC=O")
|
|
|
|
def test4Reionize(self):
|
|
mol = Chem.MolFromSmiles("C1=C(C=CC(=C1)[S]([O-])=O)[S](O)(=O)=O")
|
|
nmol = rdMolStandardize.Reionize(mol)
|
|
self.assertEqual(Chem.MolToSmiles(nmol), "O=S(O)c1ccc(S(=O)(=O)[O-])cc1")
|
|
|
|
def test5Metal(self):
|
|
mol = Chem.MolFromSmiles("C1(CCCCC1)[Zn]Br")
|
|
md = rdMolStandardize.MetalDisconnector()
|
|
nm = md.Disconnect(mol)
|
|
self.assertEqual(Chem.MolToSmiles(nm), "[Br-].[CH-]1CCCCC1.[Zn+2]")
|
|
nm = Chem.Mol(mol)
|
|
md.DisconnectInPlace(nm)
|
|
self.assertEqual(Chem.MolToSmiles(nm), "[Br-].[CH-]1CCCCC1.[Zn+2]")
|
|
|
|
# test user defined metal_nof
|
|
md.SetMetalNof(
|
|
Chem.MolFromSmarts(
|
|
"[Li,K,Rb,Cs,Fr,Be,Mg,Ca,Sr,Ba,Ra,Sc,Ti,V,Cr,Mn,Fe,Co,Ni,Cu,Zn,Al,Ga,Y,Zr,Nb,Mo,Tc,Ru,Rh,Pd,Ag,Cd,In,Sn,Hf,Ta,W,Re,Os,Ir,Pt,Au,Hg,Tl,Pb,Bi]~[N,O,F]"
|
|
))
|
|
mol2 = Chem.MolFromSmiles("CCC(=O)[O][Na]")
|
|
nm2 = md.Disconnect(mol2)
|
|
self.assertEqual(Chem.MolToSmiles(nm2), "CCC(=O)[O][Na]")
|
|
|
|
# Split with organometallics disconnector, two ways.
|
|
rufile = os.path.join(RDConfig.RDBaseDir, 'Code', 'GraphMol', 'MolStandardize', 'test_data',
|
|
'ruthenium.mol')
|
|
rumol = Chem.MolFromMolFile(rufile)
|
|
disrumol = rdMolStandardize.DisconnectOrganometallics(rumol)
|
|
self.assertEqual(Chem.MolToSmiles(disrumol),
|
|
"[Cl-].[Cl-].[Cl-].[Cl-].[Ru+2].[Ru+2].c1ccccc1.c1ccccc1")
|
|
|
|
opts = rdMolStandardize.MetalDisconnectorOptions()
|
|
opts.splitGrignards = True
|
|
opts.splitAromaticC = True
|
|
opts.adjustCharges = False
|
|
opts.removeHapticDummies = True
|
|
def_opts = rdMolStandardize.MetalDisconnectorOptions()
|
|
self.assertNotEqual(def_opts.splitGrignards, opts.splitGrignards)
|
|
self.assertNotEqual(def_opts.splitAromaticC, opts.splitAromaticC)
|
|
self.assertNotEqual(def_opts.adjustCharges, opts.adjustCharges)
|
|
self.assertNotEqual(def_opts.removeHapticDummies, opts.removeHapticDummies)
|
|
|
|
md = rdMolStandardize.MetalDisconnector(opts)
|
|
|
|
grigfile = os.path.join(RDConfig.RDBaseDir, 'Code', 'GraphMol', 'MolStandardize', 'test_data',
|
|
'grignard_2.mol')
|
|
grigmol = Chem.MolFromMolFile(grigfile)
|
|
disgrigmol = md.Disconnect(grigmol)
|
|
self.assertEqual(Chem.MolToSmiles(disgrigmol), "[Cl-].[Mg+2].[c-]1ccccc1")
|
|
|
|
# and passing in the options explicitly
|
|
disrumol = rdMolStandardize.DisconnectOrganometallics(rumol, opts)
|
|
self.assertEqual(Chem.MolToSmiles(disrumol),
|
|
"[Cl-].[Cl-].[Cl-].[Cl-].[Ru+2].[Ru+2].c1ccccc1.c1ccccc1")
|
|
|
|
def test6Charge(self):
|
|
mol = Chem.MolFromSmiles("C1=C(C=CC(=C1)[S]([O-])=O)[S](O)(=O)=O")
|
|
# instantiate with default acid base pair library
|
|
reionizer = rdMolStandardize.Reionizer()
|
|
nm = reionizer.reionize(mol)
|
|
self.assertEqual(Chem.MolToSmiles(nm), "O=S(O)c1ccc(S(=O)(=O)[O-])cc1")
|
|
|
|
nm = Chem.Mol(mol)
|
|
reionizer.reionizeInPlace(nm)
|
|
self.assertEqual(Chem.MolToSmiles(nm), "O=S(O)c1ccc(S(=O)(=O)[O-])cc1")
|
|
|
|
# try reionize with another acid base pair library without the right
|
|
# pairs
|
|
abfile = os.path.join(RDConfig.RDBaseDir, 'Code', 'GraphMol', 'MolStandardize', 'test_data',
|
|
'acid_base_pairs2.txt')
|
|
reionizer2 = rdMolStandardize.Reionizer(abfile)
|
|
nm2 = reionizer2.reionize(mol)
|
|
self.assertEqual(Chem.MolToSmiles(nm2), "O=S([O-])c1ccc(S(=O)(=O)O)cc1")
|
|
|
|
# test Uncharger
|
|
uncharger = rdMolStandardize.Uncharger()
|
|
mol3 = Chem.MolFromSmiles("O=C([O-])c1ccccc1")
|
|
nm3 = uncharger.uncharge(mol3)
|
|
self.assertEqual(Chem.MolToSmiles(nm3), "O=C(O)c1ccccc1")
|
|
nm3 = Chem.Mol(mol3)
|
|
uncharger.unchargeInPlace(nm3)
|
|
self.assertEqual(Chem.MolToSmiles(nm3), "O=C(O)c1ccccc1")
|
|
|
|
# test canonical Uncharger
|
|
uncharger = rdMolStandardize.Uncharger(canonicalOrder=False)
|
|
mol3 = Chem.MolFromSmiles("C[N+](C)(C)CC(C(=O)[O-])CC(=O)[O-]")
|
|
nm3 = uncharger.uncharge(mol3)
|
|
self.assertEqual(Chem.MolToSmiles(nm3), "C[N+](C)(C)CC(CC(=O)[O-])C(=O)O")
|
|
nm3 = Chem.Mol(mol3)
|
|
uncharger.unchargeInPlace(nm3)
|
|
self.assertEqual(Chem.MolToSmiles(nm3), "C[N+](C)(C)CC(CC(=O)[O-])C(=O)O")
|
|
|
|
uncharger = rdMolStandardize.Uncharger(canonicalOrder=True)
|
|
nm3 = uncharger.uncharge(mol3)
|
|
self.assertEqual(Chem.MolToSmiles(nm3), "C[N+](C)(C)CC(CC(=O)O)C(=O)[O-]")
|
|
nm3 = Chem.Mol(mol3)
|
|
uncharger.unchargeInPlace(nm3)
|
|
self.assertEqual(Chem.MolToSmiles(nm3), "C[N+](C)(C)CC(CC(=O)O)C(=O)[O-]")
|
|
|
|
def test7Fragment(self):
|
|
fragremover = rdMolStandardize.FragmentRemover()
|
|
mol = Chem.MolFromSmiles("CN(C)C.Cl.Cl.Br")
|
|
nm = fragremover.remove(mol)
|
|
self.assertEqual(Chem.MolToSmiles(nm), "CN(C)C")
|
|
|
|
nm = Chem.Mol(mol)
|
|
fragremover.removeInPlace(nm)
|
|
self.assertEqual(Chem.MolToSmiles(nm), "CN(C)C")
|
|
|
|
lfragchooser = rdMolStandardize.LargestFragmentChooser()
|
|
mol2 = Chem.MolFromSmiles("[N+](=O)([O-])[O-].[CH3+]")
|
|
nm2 = lfragchooser.choose(mol2)
|
|
self.assertEqual(Chem.MolToSmiles(nm2), "O=[N+]([O-])[O-]")
|
|
nm2 = Chem.Mol(mol2)
|
|
lfragchooser.chooseInPlace(nm2)
|
|
self.assertEqual(Chem.MolToSmiles(nm2), "O=[N+]([O-])[O-]")
|
|
|
|
lfragchooser2 = rdMolStandardize.LargestFragmentChooser(preferOrganic=True)
|
|
nm3 = lfragchooser2.choose(mol2)
|
|
self.assertEqual(Chem.MolToSmiles(nm3), "[CH3+]")
|
|
nm3 = Chem.Mol(mol2)
|
|
lfragchooser2.chooseInPlace(nm3)
|
|
self.assertEqual(Chem.MolToSmiles(nm3), "[CH3+]")
|
|
|
|
fragremover = rdMolStandardize.FragmentRemover(skip_if_all_match=True)
|
|
mol = Chem.MolFromSmiles("[Na+].Cl.Cl.Br")
|
|
nm = fragremover.remove(mol)
|
|
self.assertEqual(nm.GetNumAtoms(), mol.GetNumAtoms())
|
|
nm = Chem.Mol(mol)
|
|
fragremover.removeInPlace(mol)
|
|
self.assertEqual(nm.GetNumAtoms(), mol.GetNumAtoms())
|
|
|
|
smi3 = "CNC[C@@H]([C@H]([C@@H]([C@@H](CO)O)O)O)O.c1cc2c(cc1C(=O)O)oc(n2)c3cc(cc(c3)Cl)Cl"
|
|
|
|
lfParams = rdMolStandardize.CleanupParameters()
|
|
lfrag_params = rdMolStandardize.LargestFragmentChooser(lfParams)
|
|
mol3 = Chem.MolFromSmiles(smi3)
|
|
lfrag3 = lfrag_params.choose(mol3)
|
|
self.assertEqual(Chem.MolToSmiles(lfrag3), "CNC[C@H](O)[C@@H](O)[C@H](O)[C@H](O)CO")
|
|
|
|
lfParams = rdMolStandardize.CleanupParameters()
|
|
lfParams.largestFragmentChooserCountHeavyAtomsOnly = True
|
|
lfrag_params = rdMolStandardize.LargestFragmentChooser(lfParams)
|
|
mol3 = Chem.MolFromSmiles(smi3)
|
|
lfrag3 = lfrag_params.choose(mol3)
|
|
self.assertEqual(Chem.MolToSmiles(lfrag3), "O=C(O)c1ccc2nc(-c3cc(Cl)cc(Cl)c3)oc2c1")
|
|
|
|
lfParams = rdMolStandardize.CleanupParameters()
|
|
lfParams.largestFragmentChooserUseAtomCount = False
|
|
lfrag_params = rdMolStandardize.LargestFragmentChooser(lfParams)
|
|
mol3 = Chem.MolFromSmiles(smi3)
|
|
lfrag3 = lfrag_params.choose(mol3)
|
|
self.assertEqual(Chem.MolToSmiles(lfrag3), "O=C(O)c1ccc2nc(-c3cc(Cl)cc(Cl)c3)oc2c1")
|
|
|
|
smi4 = "CC.O=[Pb]=O"
|
|
|
|
lfParams = rdMolStandardize.CleanupParameters()
|
|
lfrag_params = rdMolStandardize.LargestFragmentChooser(lfParams)
|
|
mol4 = Chem.MolFromSmiles(smi4)
|
|
lfrag4 = lfrag_params.choose(mol4)
|
|
self.assertEqual(Chem.MolToSmiles(lfrag4), "CC")
|
|
|
|
lfParams = rdMolStandardize.CleanupParameters()
|
|
lfParams.largestFragmentChooserCountHeavyAtomsOnly = True
|
|
lfrag_params = rdMolStandardize.LargestFragmentChooser(lfParams)
|
|
mol4 = Chem.MolFromSmiles(smi4)
|
|
lfrag4 = lfrag_params.choose(mol4)
|
|
self.assertEqual(Chem.MolToSmiles(lfrag4), "[O]=[Pb]=[O]")
|
|
|
|
lfParams = rdMolStandardize.CleanupParameters()
|
|
lfParams.largestFragmentChooserUseAtomCount = False
|
|
lfrag_params = rdMolStandardize.LargestFragmentChooser(lfParams)
|
|
mol4 = Chem.MolFromSmiles(smi4)
|
|
lfrag4 = lfrag_params.choose(mol4)
|
|
self.assertEqual(Chem.MolToSmiles(lfrag4), "[O]=[Pb]=[O]")
|
|
|
|
lfParams = rdMolStandardize.CleanupParameters()
|
|
lfParams.largestFragmentChooserCountHeavyAtomsOnly = True
|
|
lfParams.preferOrganic = True
|
|
lfrag_params = rdMolStandardize.LargestFragmentChooser(lfParams)
|
|
mol4 = Chem.MolFromSmiles(smi4)
|
|
lfrag4 = lfrag_params.choose(mol4)
|
|
self.assertEqual(Chem.MolToSmiles(lfrag4), "CC")
|
|
|
|
lfParams = rdMolStandardize.CleanupParameters()
|
|
lfParams.largestFragmentChooserUseAtomCount = False
|
|
lfParams.preferOrganic = True
|
|
lfrag_params = rdMolStandardize.LargestFragmentChooser(lfParams)
|
|
mol4 = Chem.MolFromSmiles(smi4)
|
|
lfrag4 = lfrag_params.choose(mol4)
|
|
self.assertEqual(Chem.MolToSmiles(lfrag4), "CC")
|
|
|
|
def test8Normalize(self):
|
|
normalizer = rdMolStandardize.Normalizer()
|
|
mol = Chem.MolFromSmiles("C[n+]1ccccc1[O-]")
|
|
nm = normalizer.normalize(mol)
|
|
self.assertEqual(Chem.MolToSmiles(nm), "Cn1ccccc1=O")
|
|
nm = Chem.Mol(mol)
|
|
normalizer.normalizeInPlace(nm)
|
|
self.assertEqual(Chem.MolToSmiles(nm), "Cn1ccccc1=O")
|
|
|
|
def test9Validate(self):
|
|
vm = rdMolStandardize.RDKitValidation()
|
|
mol = Chem.MolFromSmiles("CO(C)C", sanitize=False)
|
|
msg = vm.validate(mol)
|
|
self.assertEqual(len(msg), 1)
|
|
self.assertEqual(
|
|
"""INFO: [ValenceValidation] Explicit valence for atom # 1 O, 3, is greater than permitted""",
|
|
msg[0])
|
|
mol = Chem.MolFromSmiles("")
|
|
msg = vm.validate(mol)
|
|
self.assertEqual(len(msg), 1)
|
|
self.assertEqual("ERROR: [NoAtomValidation] Molecule has no atoms", msg[0])
|
|
vm.allowEmptyMolecules = True
|
|
msg = vm.validate(mol)
|
|
self.assertEqual(len(msg), 0)
|
|
|
|
|
|
vm2 = rdMolStandardize.MolVSValidation([rdMolStandardize.FragmentValidation()])
|
|
# with no argument it also works
|
|
# vm2 = rdMolStandardize.MolVSValidation()
|
|
mol2 = Chem.MolFromSmiles("COc1cccc(C=N[N-]C(N)=O)c1[O-].O.O.O.O=[U+2]=O")
|
|
msg2 = vm2.validate(mol2)
|
|
self.assertEqual(len(msg2), 1)
|
|
self.assertEqual("""INFO: [FragmentValidation] water/hydroxide is present""", msg2[0])
|
|
|
|
vm3 = rdMolStandardize.MolVSValidation()
|
|
mol3 = Chem.MolFromSmiles("C1COCCO1.O=C(NO)NO")
|
|
msg3 = vm3.validate(mol3)
|
|
self.assertEqual(len(msg3), 2)
|
|
self.assertEqual("""INFO: [FragmentValidation] 1,2-dimethoxyethane is present""", msg3[0])
|
|
self.assertEqual("""INFO: [FragmentValidation] 1,4-dioxane is present""", msg3[1])
|
|
|
|
atomic_no = [6, 7, 8]
|
|
allowed_atoms = [Atom(i) for i in atomic_no]
|
|
vm4 = rdMolStandardize.AllowedAtomsValidation(allowed_atoms)
|
|
mol4 = Chem.MolFromSmiles("CC(=O)CF")
|
|
msg4 = vm4.validate(mol4)
|
|
self.assertEqual(len(msg4), 1)
|
|
self.assertEqual("""INFO: [AllowedAtomsValidation] Atom F is not in allowedAtoms list""",
|
|
msg4[0])
|
|
|
|
atomic_no = [9, 17, 35]
|
|
disallowed_atoms = [Atom(i) for i in atomic_no]
|
|
vm5 = rdMolStandardize.DisallowedAtomsValidation(disallowed_atoms)
|
|
mol5 = Chem.MolFromSmiles("CC(=O)CF")
|
|
msg5 = vm5.validate(mol5)
|
|
self.assertEqual(len(msg5), 1)
|
|
self.assertEqual("""INFO: [DisallowedAtomsValidation] Atom F is in disallowedAtoms list""",
|
|
msg5[0])
|
|
|
|
mol6 = Chem.MolFromSmiles("[3CH4]")
|
|
vm6a = rdMolStandardize.IsotopeValidation()
|
|
msg6a = vm6a.validate(mol6)
|
|
self.assertEqual(len(msg6a), 1)
|
|
self.assertEqual("INFO: [IsotopeValidation] Molecule contains isotope 3C", msg6a[0])
|
|
vm6b = rdMolStandardize.IsotopeValidation(True)
|
|
msg6b = vm6b.validate(mol6)
|
|
self.assertEqual(len(msg6b), 1)
|
|
self.assertEqual("ERROR: [IsotopeValidation] The molecule contains an unknown isotope: 3C",
|
|
msg6b[0])
|
|
|
|
msg999 = rdMolStandardize.ValidateSmiles("ClCCCl.c1ccccc1O")
|
|
self.assertEqual(len(msg999), 1)
|
|
self.assertEqual("""INFO: [FragmentValidation] 1,2-dichloroethane is present""", msg999[0])
|
|
|
|
def test10NormalizeFromData(self):
|
|
data = """// Name SMIRKS
|
|
Nitro to N+(O-)=O [N,P,As,Sb;X3:1](=[O,S,Se,Te:2])=[O,S,Se,Te:3]>>[*+1:1]([*-1:2])=[*:3]
|
|
Sulfone to S(=O)(=O) [S+2:1]([O-:2])([O-:3])>>[S+0:1](=[O-0:2])(=[O-0:3])
|
|
Pyridine oxide to n+O- [n:1]=[O:2]>>[n+:1][O-:2]
|
|
// Azide to N=N+=N- [*,H:1][N:2]=[N:3]#[N:4]>>[*,H:1][N:2]=[N+:3]=[N-:4]
|
|
"""
|
|
normalizer1 = rdMolStandardize.Normalizer()
|
|
params = rdMolStandardize.CleanupParameters()
|
|
normalizer2 = rdMolStandardize.NormalizerFromData(data, params)
|
|
|
|
imol = Chem.MolFromSmiles("O=N(=O)CCN=N#N", sanitize=False)
|
|
mol1 = normalizer1.normalize(imol)
|
|
mol2 = normalizer2.normalize(imol)
|
|
self.assertEqual(Chem.MolToSmiles(imol), "N#N=NCCN(=O)=O")
|
|
self.assertEqual(Chem.MolToSmiles(mol1), "[N-]=[N+]=NCC[N+](=O)[O-]")
|
|
self.assertEqual(Chem.MolToSmiles(mol2), "N#N=NCC[N+](=O)[O-]")
|
|
|
|
def test11FragmentParams(self):
|
|
data = """// Name SMARTS
|
|
fluorine [F]
|
|
chlorine [Cl]
|
|
"""
|
|
fragremover = rdMolStandardize.FragmentRemoverFromData(data)
|
|
mol = Chem.MolFromSmiles("CN(C)C.Cl.Cl.Br")
|
|
nm = fragremover.remove(mol)
|
|
self.assertEqual(Chem.MolToSmiles(nm), "Br.CN(C)C")
|
|
|
|
def test12ChargeParams(self):
|
|
params = """// The default list of AcidBasePairs, sorted from strongest to weakest.
|
|
// This list is derived from the Food and Drug: Administration Substance
|
|
// Registration System Standard Operating Procedure guide.
|
|
//
|
|
// Name Acid Base
|
|
-SO2H [!O][SD3](=O)[OH] [!O][SD3](=O)[O-]
|
|
-SO3H [!O]S(=O)(=O)[OH] [!O]S(=O)(=O)[O-]
|
|
"""
|
|
mol = Chem.MolFromSmiles("C1=C(C=CC(=C1)[S]([O-])=O)[S](O)(=O)=O")
|
|
# instantiate with default acid base pair library
|
|
reionizer = rdMolStandardize.ReionizerFromData(params, [])
|
|
print("done")
|
|
nm = reionizer.reionize(mol)
|
|
self.assertEqual(Chem.MolToSmiles(nm), "O=S([O-])c1ccc(S(=O)(=O)O)cc1")
|
|
|
|
def test13Tautomers(self):
|
|
enumerator = rdMolStandardize.TautomerEnumerator()
|
|
m = Chem.MolFromSmiles("C1(=CCCCC1)O")
|
|
ctaut = enumerator.Canonicalize(m)
|
|
self.assertEqual(Chem.MolToSmiles(ctaut), "O=C1CCCCC1")
|
|
|
|
params = rdMolStandardize.CleanupParameters()
|
|
enumerator = rdMolStandardize.TautomerEnumerator(params)
|
|
m = Chem.MolFromSmiles("C1(=CCCCC1)O")
|
|
ctaut = enumerator.Canonicalize(m)
|
|
self.assertEqual(Chem.MolToSmiles(ctaut), "O=C1CCCCC1")
|
|
|
|
taut_res = enumerator.Enumerate(m)
|
|
self.assertEqual(len(taut_res), 2)
|
|
ctauts = list(sorted(Chem.MolToSmiles(x) for x in taut_res))
|
|
self.assertEqual(ctauts, ['O=C1CCCCC1', 'OC1=CCCCC1'])
|
|
self.assertEqual(list(taut_res.smiles), ['O=C1CCCCC1', 'OC1=CCCCC1'])
|
|
# this tests the non-templated overload
|
|
self.assertEqual(Chem.MolToSmiles(enumerator.PickCanonical(taut_res)), "O=C1CCCCC1")
|
|
# this tests the templated overload
|
|
self.assertEqual(Chem.MolToSmiles(enumerator.PickCanonical(set(taut_res()))), "O=C1CCCCC1")
|
|
with self.assertRaises(TypeError):
|
|
enumerator.PickCanonical(1)
|
|
with self.assertRaises(TypeError):
|
|
enumerator.PickCanonical([0, 1])
|
|
self.assertEqual(
|
|
Chem.MolToSmiles(
|
|
enumerator.PickCanonical(Chem.MolFromSmiles(x) for x in ['O=C1CCCCC1', 'OC1=CCCCC1'])),
|
|
"O=C1CCCCC1")
|
|
|
|
def scorefunc1(mol):
|
|
' stupid tautomer scoring function '
|
|
p = Chem.MolFromSmarts('[OH]')
|
|
return len(mol.GetSubstructMatches(p))
|
|
|
|
def scorefunc2(mol):
|
|
' stupid tautomer scoring function '
|
|
p = Chem.MolFromSmarts('O=C')
|
|
return len(mol.GetSubstructMatches(p))
|
|
|
|
m = Chem.MolFromSmiles("C1(=CCCCC1)O")
|
|
ctaut = enumerator.Canonicalize(m, scorefunc1)
|
|
self.assertEqual(Chem.MolToSmiles(ctaut), "OC1=CCCCC1")
|
|
ctaut = enumerator.Canonicalize(m, scorefunc2)
|
|
self.assertEqual(Chem.MolToSmiles(ctaut), "O=C1CCCCC1")
|
|
# make sure lambdas work
|
|
ctaut = enumerator.Canonicalize(m,
|
|
lambda x: len(x.GetSubstructMatches(Chem.MolFromSmarts('C=O'))))
|
|
self.assertEqual(Chem.MolToSmiles(ctaut), "O=C1CCCCC1")
|
|
|
|
# make sure we behave if we return something bogus from the scoring function
|
|
with self.assertRaises(TypeError):
|
|
ctaut = enumerator.Canonicalize(m, lambda x: 'fail')
|
|
|
|
self.assertEqual(enumerator.ScoreTautomer(Chem.MolFromSmiles('N=c1[nH]cccc1')), 99)
|
|
self.assertEqual(enumerator.ScoreTautomer(Chem.MolFromSmiles('Nc1ncccc1')), 100)
|
|
|
|
def scorefunc2(mol):
|
|
' stupid tautomer scoring function '
|
|
p = Chem.MolFromSmarts('O=C')
|
|
return len(mol.GetSubstructMatches(p))
|
|
|
|
m = Chem.MolFromSmiles("C1(=CCCCC1)O")
|
|
ctaut = enumerator.Canonicalize(m, scorefunc1)
|
|
self.assertEqual(Chem.MolToSmiles(ctaut), "OC1=CCCCC1")
|
|
ctaut = enumerator.Canonicalize(m, scorefunc2)
|
|
self.assertEqual(Chem.MolToSmiles(ctaut), "O=C1CCCCC1")
|
|
# make sure lambdas work
|
|
ctaut = enumerator.Canonicalize(m,
|
|
lambda x: len(x.GetSubstructMatches(Chem.MolFromSmarts('C=O'))))
|
|
self.assertEqual(Chem.MolToSmiles(ctaut), "O=C1CCCCC1")
|
|
|
|
# make sure we behave if we return something bogus from the scoring function
|
|
with self.assertRaises(TypeError):
|
|
ctaut = enumerator.Canonicalize(m, lambda x: 'fail')
|
|
|
|
self.assertEqual(enumerator.ScoreTautomer(Chem.MolFromSmiles('N=c1[nH]cccc1')), 99)
|
|
self.assertEqual(enumerator.ScoreTautomer(Chem.MolFromSmiles('Nc1ncccc1')), 100)
|
|
|
|
res = enumerator.Enumerate(m)
|
|
# this test the specialized overload
|
|
ctaut = enumerator.PickCanonical(res, scorefunc1)
|
|
self.assertEqual(Chem.MolToSmiles(ctaut), "OC1=CCCCC1")
|
|
ctaut = enumerator.PickCanonical(res, scorefunc2)
|
|
self.assertEqual(Chem.MolToSmiles(ctaut), "O=C1CCCCC1")
|
|
# make sure lambdas work
|
|
ctaut = enumerator.PickCanonical(
|
|
res, lambda x: len(x.GetSubstructMatches(Chem.MolFromSmarts('C=O'))))
|
|
self.assertEqual(Chem.MolToSmiles(ctaut), "O=C1CCCCC1")
|
|
# make sure we behave if we return something bogus from the scoring function
|
|
with self.assertRaises(TypeError):
|
|
ctaut = enumerator.PickCanonical(res, lambda x: 'fail')
|
|
# this test the non-specialized overload
|
|
ctaut = enumerator.PickCanonical(set(res()), scorefunc1)
|
|
self.assertEqual(Chem.MolToSmiles(ctaut), "OC1=CCCCC1")
|
|
ctaut = enumerator.PickCanonical(set(res()), scorefunc2)
|
|
self.assertEqual(Chem.MolToSmiles(ctaut), "O=C1CCCCC1")
|
|
# make sure lambdas work
|
|
ctaut = enumerator.PickCanonical(
|
|
set(res()), lambda x: len(x.GetSubstructMatches(Chem.MolFromSmarts('C=O'))))
|
|
self.assertEqual(Chem.MolToSmiles(ctaut), "O=C1CCCCC1")
|
|
# make sure we behave if we return something bogus from the scoring function
|
|
with self.assertRaises(TypeError):
|
|
ctaut = enumerator.PickCanonical(set(res()), lambda x: 'fail')
|
|
|
|
def test13bTautomerOrderIndependence(self):
|
|
# Regression: tautomer enumeration/canonicalization should be independent of
|
|
# atom/bond storage order (e.g. after atom renumbering).
|
|
import random
|
|
|
|
smi = "OC1=Nc2ccccc2C1=Cc1ccc[nH]1"
|
|
mol = Chem.MolFromSmiles(smi)
|
|
self.assertIsNotNone(mol)
|
|
|
|
enumerator = rdMolStandardize.TautomerEnumerator()
|
|
enumerator.SetMaxTautomers(2000)
|
|
enumerator.SetMaxTransforms(2000)
|
|
|
|
def enumerate_smiles_set(m):
|
|
return {Chem.MolToSmiles(t, isomericSmiles=True) for t in enumerator.Enumerate(m)}
|
|
|
|
base_set = enumerate_smiles_set(mol)
|
|
base_canon = Chem.MolToSmiles(enumerator.Canonicalize(Chem.Mol(mol)), isomericSmiles=True)
|
|
|
|
# Seeds chosen from a prior repro where many permutations differed.
|
|
for seed in (0, 2, 4, 7, 11, 13, 14, 20, 23, 27, 28, 29, 31, 37, 39, 42, 44, 48):
|
|
rng = random.Random(seed)
|
|
order = list(range(mol.GetNumAtoms()))
|
|
rng.shuffle(order)
|
|
renum = Chem.RenumberAtoms(mol, order)
|
|
|
|
renum_set = enumerate_smiles_set(renum)
|
|
self.assertEqual(renum_set, base_set, f"seed {seed} changed Enumerate() results")
|
|
|
|
renum_canon = Chem.MolToSmiles(enumerator.Canonicalize(Chem.Mol(renum)), isomericSmiles=True)
|
|
self.assertEqual(renum_canon, base_canon, f"seed {seed} changed Canonicalize() result")
|
|
|
|
def test14TautomerDetails(self):
|
|
enumerator = rdMolStandardize.TautomerEnumerator()
|
|
m = Chem.MolFromSmiles("c1ccccc1CN=c1[nH]cccc1")
|
|
taut_res = enumerator.Enumerate(m)
|
|
self.assertEqual(len(taut_res.tautomers), 2)
|
|
self.assertEqual(taut_res.modifiedAtoms, (7, 9))
|
|
self.assertEqual(len(taut_res.modifiedBonds), 7)
|
|
self.assertEqual(taut_res.modifiedBonds, (7, 8, 9, 10, 11, 12, 14))
|
|
|
|
taut_res = enumerator.Enumerate(m)
|
|
self.assertEqual(len(taut_res.tautomers), 2)
|
|
self.assertEqual(taut_res.modifiedAtoms, (7, 9))
|
|
|
|
taut_res = enumerator.Enumerate(m)
|
|
self.assertEqual(len(taut_res.tautomers), 2)
|
|
self.assertEqual(len(taut_res.modifiedBonds), 7)
|
|
self.assertEqual(taut_res.modifiedBonds, (7, 8, 9, 10, 11, 12, 14))
|
|
|
|
def test15EnumeratorParams(self):
|
|
# Test a structure with hundreds of tautomers.
|
|
smi68 = "[H][C](CO)(NC(=O)C1=C(O)C(O)=CC=C1)C(O)=O"
|
|
m68 = Chem.MolFromSmiles(smi68)
|
|
|
|
enumerator = rdMolStandardize.TautomerEnumerator()
|
|
res68 = enumerator.Enumerate(m68)
|
|
self.assertEqual(len(res68), 72)
|
|
self.assertEqual(len(res68.tautomers), len(res68))
|
|
self.assertEqual(res68.status, rdMolStandardize.TautomerEnumeratorStatus.Completed)
|
|
|
|
enumerator = rdMolStandardize.GetV1TautomerEnumerator()
|
|
res68 = enumerator.Enumerate(m68)
|
|
self.assertEqual(len(res68), 295)
|
|
self.assertEqual(len(res68.tautomers), len(res68))
|
|
self.assertEqual(res68.status, rdMolStandardize.TautomerEnumeratorStatus.MaxTransformsReached)
|
|
|
|
params = rdMolStandardize.CleanupParameters()
|
|
params.maxTautomers = 50
|
|
enumerator = rdMolStandardize.TautomerEnumerator(params)
|
|
res68 = enumerator.Enumerate(m68)
|
|
self.assertEqual(len(res68), 50)
|
|
self.assertEqual(res68.status, rdMolStandardize.TautomerEnumeratorStatus.MaxTautomersReached)
|
|
|
|
|
|
origVal = Chem.GetUseLegacyStereoPerception()
|
|
for useLegacy in (True, False):
|
|
Chem.SetUseLegacyStereoPerception(useLegacy)
|
|
|
|
sAlaSmi = "C[C@H](N)C(=O)O"
|
|
sAla = Chem.MolFromSmiles(sAlaSmi)
|
|
rdCIPLabeler.AssignCIPLabels(sAla)
|
|
# test remove (S)-Ala stereochemistry
|
|
self.assertEqual(sAla.GetAtomWithIdx(1).GetChiralTag(), Chem.ChiralType.CHI_TETRAHEDRAL_CCW)
|
|
self.assertEqual(sAla.GetAtomWithIdx(1).GetProp("_CIPCode"), "S")
|
|
params = rdMolStandardize.CleanupParameters()
|
|
params.tautomerRemoveSp3Stereo = True
|
|
enumerator = rdMolStandardize.TautomerEnumerator(params)
|
|
res = enumerator.Enumerate(sAla)
|
|
for taut in res:
|
|
self.assertEqual(taut.GetAtomWithIdx(1).GetChiralTag(), Chem.ChiralType.CHI_UNSPECIFIED)
|
|
self.assertFalse(taut.GetAtomWithIdx(1).HasProp("_CIPCode"))
|
|
for taut in res.tautomers:
|
|
self.assertEqual(taut.GetAtomWithIdx(1).GetChiralTag(), Chem.ChiralType.CHI_UNSPECIFIED)
|
|
self.assertFalse(taut.GetAtomWithIdx(1).HasProp("_CIPCode"))
|
|
for i, taut in enumerate(res):
|
|
self.assertEqual(Chem.MolToSmiles(taut), Chem.MolToSmiles(res.tautomers[i]))
|
|
self.assertEqual(len(res), len(res.smiles))
|
|
self.assertEqual(len(res), len(res.tautomers))
|
|
self.assertEqual(len(res), len(res()))
|
|
self.assertEqual(len(res), len(res.smilesTautomerMap))
|
|
for i, taut in enumerate(res.tautomers):
|
|
self.assertEqual(Chem.MolToSmiles(taut), Chem.MolToSmiles(res[i]))
|
|
self.assertEqual(Chem.MolToSmiles(taut), res.smiles[i])
|
|
self.assertEqual(Chem.MolToSmiles(taut),
|
|
Chem.MolToSmiles(res.smilesTautomerMap.values()[i].tautomer))
|
|
for i, k in enumerate(res.smilesTautomerMap.keys()):
|
|
self.assertEqual(k, res.smiles[i])
|
|
for i, v in enumerate(res.smilesTautomerMap.values()):
|
|
self.assertEqual(Chem.MolToSmiles(v.tautomer), Chem.MolToSmiles(res[i]))
|
|
for i, (k, v) in enumerate(res.smilesTautomerMap.items()):
|
|
self.assertEqual(k, res.smiles[i])
|
|
self.assertEqual(Chem.MolToSmiles(v.tautomer), Chem.MolToSmiles(res[i]))
|
|
for i, smiles in enumerate(res.smiles):
|
|
self.assertEqual(smiles, Chem.MolToSmiles(res[i]))
|
|
self.assertEqual(smiles, res.smilesTautomerMap.keys()[i])
|
|
self.assertEqual(Chem.MolToSmiles(res.tautomers[-1]), Chem.MolToSmiles(res[-1]))
|
|
self.assertEqual(Chem.MolToSmiles(res[-1]), Chem.MolToSmiles(res[len(res) - 1]))
|
|
self.assertEqual(Chem.MolToSmiles(res.tautomers[-1]),
|
|
Chem.MolToSmiles(res.tautomers[len(res) - 1]))
|
|
with self.assertRaises(IndexError):
|
|
res[len(res)]
|
|
with self.assertRaises(IndexError):
|
|
res[-len(res) - 1]
|
|
with self.assertRaises(IndexError):
|
|
res.tautomers[len(res)]
|
|
with self.assertRaises(IndexError):
|
|
res.tautomers[-len(res.tautomers) - 1]
|
|
|
|
# test retain (S)-Ala stereochemistry
|
|
self.assertEqual(sAla.GetAtomWithIdx(1).GetChiralTag(), Chem.ChiralType.CHI_TETRAHEDRAL_CCW)
|
|
self.assertEqual(sAla.GetAtomWithIdx(1).GetProp("_CIPCode"), "S")
|
|
params = rdMolStandardize.CleanupParameters()
|
|
params.tautomerRemoveSp3Stereo = False
|
|
enumerator = rdMolStandardize.TautomerEnumerator(params)
|
|
res = enumerator.Enumerate(sAla)
|
|
for taut in res:
|
|
rdCIPLabeler.AssignCIPLabels(taut)
|
|
tautAtom = taut.GetAtomWithIdx(1)
|
|
if (tautAtom.GetHybridization() == Chem.HybridizationType.SP3):
|
|
self.assertEqual(tautAtom.GetChiralTag(), Chem.ChiralType.CHI_TETRAHEDRAL_CCW)
|
|
self.assertTrue(tautAtom.HasProp("_CIPCode"))
|
|
self.assertEqual(tautAtom.GetProp("_CIPCode"), "S")
|
|
else:
|
|
self.assertFalse(tautAtom.HasProp("_CIPCode"))
|
|
self.assertEqual(tautAtom.GetChiralTag(), Chem.ChiralType.CHI_UNSPECIFIED)
|
|
|
|
eEnolSmi = "C/C=C/O"
|
|
eEnol = Chem.MolFromSmiles(eEnolSmi)
|
|
if useLegacy:
|
|
self.assertEqual(eEnol.GetBondWithIdx(1).GetStereo(), Chem.BondStereo.STEREOE)
|
|
else:
|
|
self.assertEqual(eEnol.GetBondWithIdx(1).GetStereo(), Chem.BondStereo.STEREOTRANS)
|
|
# test remove enol E stereochemistry
|
|
params = rdMolStandardize.CleanupParameters()
|
|
params.tautomerRemoveBondStereo = True
|
|
enumerator = rdMolStandardize.TautomerEnumerator(params)
|
|
res = enumerator.Enumerate(eEnol)
|
|
for taut in res.tautomers:
|
|
bond = taut.GetBondWithIdx(1)
|
|
self.assertTrue(
|
|
(bond.GetBondType() == Chem.BondType.DOUBLE and bond.GetStereo() == Chem.BondStereo.STEREOANY) or
|
|
(bond.GetBondType() != Chem.BondType.DOUBLE and bond.GetStereo() == Chem.BondStereo.STEREONONE)
|
|
)
|
|
# test retain enol E stereochemistry
|
|
params = rdMolStandardize.CleanupParameters()
|
|
params.tautomerRemoveBondStereo = False
|
|
enumerator = rdMolStandardize.TautomerEnumerator(params)
|
|
res = enumerator.Enumerate(eEnol)
|
|
for taut in res.tautomers:
|
|
if (taut.GetBondWithIdx(1).GetBondType() == Chem.BondType.DOUBLE):
|
|
if useLegacy:
|
|
self.assertEqual(taut.GetBondWithIdx(1).GetStereo(), Chem.BondStereo.STEREOE)
|
|
else:
|
|
self.assertEqual(taut.GetBondWithIdx(1).GetStereo(), Chem.BondStereo.STEREOTRANS)
|
|
|
|
zEnolSmi = "C/C=C\\O"
|
|
zEnol = Chem.MolFromSmiles(zEnolSmi)
|
|
if useLegacy:
|
|
self.assertEqual(zEnol.GetBondWithIdx(1).GetStereo(), Chem.BondStereo.STEREOZ)
|
|
else:
|
|
self.assertEqual(zEnol.GetBondWithIdx(1).GetStereo(), Chem.BondStereo.STEREOCIS)
|
|
# test remove enol Z stereochemistry
|
|
params = rdMolStandardize.CleanupParameters()
|
|
params.tautomerRemoveBondStereo = True
|
|
enumerator = rdMolStandardize.TautomerEnumerator(params)
|
|
res = enumerator.Enumerate(zEnol)
|
|
for taut in res:
|
|
bond = taut.GetBondWithIdx(1)
|
|
self.assertTrue(
|
|
(bond.GetBondType() == Chem.BondType.DOUBLE and bond.GetStereo() == Chem.BondStereo.STEREOANY) or
|
|
(bond.GetBondType() != Chem.BondType.DOUBLE and bond.GetStereo() == Chem.BondStereo.STEREONONE)
|
|
)
|
|
# test retain enol Z stereochemistry
|
|
params = rdMolStandardize.CleanupParameters()
|
|
params.tautomerRemoveBondStereo = False
|
|
enumerator = rdMolStandardize.TautomerEnumerator(params)
|
|
res = enumerator.Enumerate(zEnol)
|
|
for taut in res:
|
|
if (taut.GetBondWithIdx(1).GetBondType() == Chem.BondType.DOUBLE):
|
|
if useLegacy:
|
|
self.assertEqual(taut.GetBondWithIdx(1).GetStereo(), Chem.BondStereo.STEREOZ)
|
|
else:
|
|
self.assertEqual(taut.GetBondWithIdx(1).GetStereo(), Chem.BondStereo.STEREOCIS)
|
|
Chem.SetUseLegacyStereoPerception(origVal)
|
|
|
|
chembl2024142Smi = "[2H]C1=C(C(=C2C(=C1[2H])C(=O)C(=C(C2=O)C([2H])([2H])[2H])C/C=C(\\C)/CC([2H])([2H])/C=C(/CC/C=C(\\C)/CCC=C(C)C)\\C([2H])([2H])[2H])[2H])[2H]"
|
|
chembl2024142 = Chem.MolFromSmiles(chembl2024142Smi)
|
|
params = Chem.RemoveHsParameters()
|
|
params.removeAndTrackIsotopes = True
|
|
chembl2024142 = Chem.RemoveHs(chembl2024142, params)
|
|
self.assertTrue(chembl2024142.GetAtomWithIdx(12).HasProp("_isotopicHs"))
|
|
# test remove isotopic Hs involved in tautomerism
|
|
params = rdMolStandardize.CleanupParameters()
|
|
params.tautomerRemoveIsotopicHs = True
|
|
enumerator = rdMolStandardize.TautomerEnumerator(params)
|
|
res = enumerator.Enumerate(chembl2024142)
|
|
for taut in res:
|
|
self.assertFalse(taut.GetAtomWithIdx(12).HasProp("_isotopicHs"))
|
|
# test retain isotopic Hs involved in tautomerism
|
|
params = rdMolStandardize.CleanupParameters()
|
|
params.tautomerRemoveIsotopicHs = False
|
|
enumerator = rdMolStandardize.TautomerEnumerator(params)
|
|
res = enumerator.Enumerate(chembl2024142)
|
|
for taut in res:
|
|
self.assertTrue(taut.GetAtomWithIdx(12).HasProp("_isotopicHs"))
|
|
|
|
def test16EnumeratorCallback(self):
|
|
|
|
class MyTautomerEnumeratorCallback(rdMolStandardize.TautomerEnumeratorCallback):
|
|
|
|
def __init__(self, parent, timeout_ms):
|
|
super().__init__()
|
|
self._parent = parent
|
|
self._timeout = timedelta(milliseconds=timeout_ms)
|
|
self._start_time = datetime.now()
|
|
|
|
def __call__(self, mol, res):
|
|
self._parent.assertTrue(isinstance(mol, Chem.Mol))
|
|
self._parent.assertTrue(isinstance(res, rdMolStandardize.TautomerEnumeratorResult))
|
|
return (datetime.now() - self._start_time < self._timeout)
|
|
|
|
class MyBrokenCallback(rdMolStandardize.TautomerEnumeratorCallback):
|
|
pass
|
|
|
|
class MyBrokenCallback2(rdMolStandardize.TautomerEnumeratorCallback):
|
|
__call__ = 1
|
|
|
|
# Test a structure with hundreds of tautomers.
|
|
smi68 = "[H][C](CO)(NC(=O)C1=C(O)C(O)=CC=C1)C(O)=O"
|
|
m68 = Chem.MolFromSmiles(smi68)
|
|
|
|
def createV1Enumerator():
|
|
enumerator = rdMolStandardize.GetV1TautomerEnumerator()
|
|
enumerator.SetMaxTransforms(10000)
|
|
enumerator.SetMaxTautomers(10000)
|
|
return enumerator
|
|
|
|
params = rdMolStandardize.CleanupParameters()
|
|
params.maxTransforms = 10000
|
|
params.maxTautomers = 10000
|
|
|
|
enumerator = createV1Enumerator()
|
|
enumerator.SetCallback(MyTautomerEnumeratorCallback(self, 50.0))
|
|
res68 = enumerator.Enumerate(m68)
|
|
# either the enumeration was canceled due to timeout
|
|
# or it has completed very quickly
|
|
hasReachedTimeout = (len(res68.tautomers) < 375
|
|
and res68.status == rdMolStandardize.TautomerEnumeratorStatus.Canceled)
|
|
hasCompleted = (len(res68.tautomers) == 375
|
|
and res68.status == rdMolStandardize.TautomerEnumeratorStatus.Completed)
|
|
if hasReachedTimeout:
|
|
print("Enumeration was canceled due to timeout (50 ms)", file=sys.stderr)
|
|
if hasCompleted:
|
|
print("Enumeration has completed", file=sys.stderr)
|
|
self.assertTrue(hasReachedTimeout or hasCompleted)
|
|
self.assertTrue(hasReachedTimeout ^ hasCompleted)
|
|
|
|
enumerator = createV1Enumerator()
|
|
enumerator.SetCallback(MyTautomerEnumeratorCallback(self, 10000.0))
|
|
res68 = enumerator.Enumerate(m68)
|
|
# either the enumeration completed
|
|
# or it ran very slowly and was canceled due to timeout
|
|
hasReachedTimeout = (len(res68.tautomers) < 375
|
|
and res68.status == rdMolStandardize.TautomerEnumeratorStatus.Canceled)
|
|
hasCompleted = (len(res68.tautomers) == 375
|
|
and res68.status == rdMolStandardize.TautomerEnumeratorStatus.Completed)
|
|
if hasReachedTimeout:
|
|
print("Enumeration was canceled due to timeout (10 s)", file=sys.stderr)
|
|
if hasCompleted:
|
|
print("Enumeration has completed", file=sys.stderr)
|
|
self.assertTrue(hasReachedTimeout or hasCompleted)
|
|
self.assertTrue(hasReachedTimeout ^ hasCompleted)
|
|
|
|
enumerator = rdMolStandardize.TautomerEnumerator(params)
|
|
with self.assertRaises(AttributeError):
|
|
enumerator.SetCallback(MyBrokenCallback())
|
|
with self.assertRaises(AttributeError):
|
|
enumerator.SetCallback(MyBrokenCallback2())
|
|
|
|
# GitHub #4736
|
|
enumerator = createV1Enumerator()
|
|
enumerator.SetCallback(MyTautomerEnumeratorCallback(self, 50.0))
|
|
enumerator_copy = rdMolStandardize.TautomerEnumerator(enumerator)
|
|
res68 = enumerator.Enumerate(m68)
|
|
res68_copy = enumerator_copy.Enumerate(m68)
|
|
self.assertTrue(res68.status == res68_copy.status)
|
|
|
|
def test17PickCanonicalCIPChangeOnChiralCenter(self):
|
|
|
|
def get_canonical_taut(res):
|
|
best_idx = max([(rdMolStandardize.TautomerEnumerator.ScoreTautomer(t), i)
|
|
for i, t in enumerate(res.tautomers)])[1]
|
|
return res.tautomers[best_idx]
|
|
|
|
smi = "CC\\C=C(/O)[C@@H](C)C(C)=O"
|
|
mol = Chem.MolFromSmiles(smi)
|
|
rdCIPLabeler.AssignCIPLabels(mol)
|
|
self.assertIsNotNone(mol)
|
|
self.assertEqual(mol.GetAtomWithIdx(5).GetChiralTag(), Chem.ChiralType.CHI_TETRAHEDRAL_CW)
|
|
self.assertEqual(mol.GetAtomWithIdx(5).GetProp("_CIPCode"), "R")
|
|
|
|
# here the chirality disappears as the chiral center is itself involved in tautomerism
|
|
te = rdMolStandardize.TautomerEnumerator()
|
|
can_taut = te.Canonicalize(mol)
|
|
self.assertIsNotNone(can_taut)
|
|
self.assertEqual(can_taut.GetAtomWithIdx(5).GetChiralTag(), Chem.ChiralType.CHI_UNSPECIFIED)
|
|
self.assertFalse(can_taut.GetAtomWithIdx(5).HasProp("_CIPCode"))
|
|
self.assertEqual(Chem.MolToSmiles(can_taut), "CCCC(=O)C(C)C(C)=O")
|
|
|
|
# here the chirality stays even if the chiral center is itself involved in tautomerism
|
|
# because of the tautomerRemoveSp3Stereo parameter being set to false
|
|
params = rdMolStandardize.CleanupParameters()
|
|
params.tautomerRemoveSp3Stereo = False
|
|
te = rdMolStandardize.TautomerEnumerator(params)
|
|
can_taut = te.Canonicalize(mol)
|
|
self.assertIsNotNone(can_taut)
|
|
rdCIPLabeler.AssignCIPLabels(can_taut)
|
|
self.assertEqual(can_taut.GetAtomWithIdx(5).GetChiralTag(), Chem.ChiralType.CHI_TETRAHEDRAL_CW)
|
|
self.assertEqual(can_taut.GetAtomWithIdx(5).GetProp("_CIPCode"), "S")
|
|
self.assertEqual(Chem.MolToSmiles(can_taut), "CCCC(=O)[C@@H](C)C(C)=O")
|
|
|
|
# here the chirality disappears as the chiral center is itself involved in tautomerism
|
|
# the reassignStereo setting has no influence
|
|
te = rdMolStandardize.TautomerEnumerator()
|
|
res = te.Enumerate(mol)
|
|
self.assertEqual(res.status, rdMolStandardize.TautomerEnumeratorStatus.Completed)
|
|
self.assertEqual(len(res.tautomers), 8)
|
|
best_taut = get_canonical_taut(res)
|
|
self.assertIsNotNone(best_taut)
|
|
self.assertEqual(best_taut.GetAtomWithIdx(5).GetChiralTag(), Chem.ChiralType.CHI_UNSPECIFIED)
|
|
self.assertFalse(best_taut.GetAtomWithIdx(5).HasProp("_CIPCode"))
|
|
self.assertEqual(Chem.MolToSmiles(best_taut), "CCCC(=O)C(C)C(C)=O")
|
|
|
|
# here the chirality disappears as the chiral center is itself involved in tautomerism
|
|
# the reassignStereo setting has no influence
|
|
params = rdMolStandardize.CleanupParameters()
|
|
params.tautomerReassignStereo = False
|
|
te = rdMolStandardize.TautomerEnumerator(params)
|
|
res = te.Enumerate(mol)
|
|
self.assertEqual(res.status, rdMolStandardize.TautomerEnumeratorStatus.Completed)
|
|
self.assertEqual(len(res.tautomers), 8)
|
|
best_taut = get_canonical_taut(res)
|
|
self.assertIsNotNone(best_taut)
|
|
self.assertEqual(best_taut.GetAtomWithIdx(5).GetChiralTag(), Chem.ChiralType.CHI_UNSPECIFIED)
|
|
self.assertFalse(best_taut.GetAtomWithIdx(5).HasProp("_CIPCode"))
|
|
self.assertEqual(Chem.MolToSmiles(best_taut), "CCCC(=O)C(C)C(C)=O")
|
|
|
|
# here the chirality stays even if the chiral center is itself involved in tautomerism
|
|
# because of the tautomerRemoveSp3Stereo parameter being set to false
|
|
# as reassignStereo by default is true, the CIP code has been recomputed
|
|
# and therefore it is now S (correct)
|
|
params = rdMolStandardize.CleanupParameters()
|
|
params.tautomerRemoveSp3Stereo = False
|
|
te = rdMolStandardize.TautomerEnumerator(params)
|
|
res = te.Enumerate(mol)
|
|
self.assertEqual(res.status, rdMolStandardize.TautomerEnumeratorStatus.Completed)
|
|
self.assertEqual(len(res.tautomers), 8)
|
|
best_taut = get_canonical_taut(res)
|
|
rdCIPLabeler.AssignCIPLabels(best_taut)
|
|
self.assertIsNotNone(best_taut)
|
|
self.assertEqual(best_taut.GetAtomWithIdx(5).GetChiralTag(), Chem.ChiralType.CHI_TETRAHEDRAL_CW)
|
|
self.assertEqual(best_taut.GetAtomWithIdx(5).GetProp("_CIPCode"), "S")
|
|
self.assertEqual(Chem.MolToSmiles(best_taut), "CCCC(=O)[C@@H](C)C(C)=O")
|
|
|
|
# here the chirality stays even if the chiral center is itself involved in tautomerism
|
|
# because of the tautomerRemoveSp3Stereo parameter being set to false
|
|
# as reassignStereo is false, the CIP code has not been recomputed
|
|
# and therefore it is still R (incorrect)
|
|
params = rdMolStandardize.CleanupParameters()
|
|
params.tautomerRemoveSp3Stereo = False
|
|
params.tautomerReassignStereo = False
|
|
te = rdMolStandardize.TautomerEnumerator(params)
|
|
res = te.Enumerate(mol)
|
|
self.assertEqual(res.status, rdMolStandardize.TautomerEnumeratorStatus.Completed)
|
|
self.assertEqual(len(res.tautomers), 8)
|
|
best_taut = get_canonical_taut(res)
|
|
self.assertIsNotNone(best_taut)
|
|
self.assertEqual(best_taut.GetAtomWithIdx(5).GetChiralTag(), Chem.ChiralType.CHI_TETRAHEDRAL_CW)
|
|
self.assertEqual(best_taut.GetAtomWithIdx(5).GetProp("_CIPCode"), "R")
|
|
self.assertEqual(Chem.MolToSmiles(best_taut), "CCCC(=O)[C@@H](C)C(C)=O")
|
|
|
|
smi = "CC\\C=C(/O)[C@@](CC)(C)C(C)=O"
|
|
mol = Chem.MolFromSmiles(smi)
|
|
self.assertIsNotNone(mol)
|
|
rdCIPLabeler.AssignCIPLabels(mol)
|
|
self.assertEqual(mol.GetAtomWithIdx(5).GetProp("_CIPCode"), "S")
|
|
self.assertEqual(mol.GetAtomWithIdx(5).GetChiralTag(), Chem.ChiralType.CHI_TETRAHEDRAL_CW)
|
|
|
|
# here the chirality stays no matter how tautomerRemoveSp3Stereo
|
|
# is set as the chiral center is not involved in tautomerism
|
|
te = rdMolStandardize.TautomerEnumerator()
|
|
can_taut = te.Canonicalize(mol)
|
|
self.assertIsNotNone(can_taut)
|
|
rdCIPLabeler.AssignCIPLabels(can_taut)
|
|
self.assertEqual(can_taut.GetAtomWithIdx(5).GetChiralTag(), Chem.ChiralType.CHI_TETRAHEDRAL_CW)
|
|
self.assertEqual(can_taut.GetAtomWithIdx(5).GetProp("_CIPCode"), "R")
|
|
self.assertEqual(Chem.MolToSmiles(can_taut), "CCCC(=O)[C@](C)(CC)C(C)=O")
|
|
|
|
params = rdMolStandardize.CleanupParameters()
|
|
params.tautomerRemoveSp3Stereo = False
|
|
te = rdMolStandardize.TautomerEnumerator(params)
|
|
can_taut = te.Canonicalize(mol)
|
|
self.assertIsNotNone(can_taut)
|
|
rdCIPLabeler.AssignCIPLabels(can_taut)
|
|
self.assertEqual(can_taut.GetAtomWithIdx(5).GetChiralTag(), Chem.ChiralType.CHI_TETRAHEDRAL_CW)
|
|
self.assertEqual(can_taut.GetAtomWithIdx(5).GetProp("_CIPCode"), "R")
|
|
self.assertEqual(Chem.MolToSmiles(can_taut), "CCCC(=O)[C@](C)(CC)C(C)=O")
|
|
|
|
# as reassignStereo by default is true, the CIP code has been recomputed
|
|
# and therefore it is now R (correct)
|
|
te = rdMolStandardize.TautomerEnumerator()
|
|
res = te.Enumerate(mol)
|
|
self.assertEqual(res.status, rdMolStandardize.TautomerEnumeratorStatus.Completed)
|
|
self.assertEqual(len(res.tautomers), 4)
|
|
best_taut = get_canonical_taut(res)
|
|
self.assertIsNotNone(best_taut)
|
|
rdCIPLabeler.AssignCIPLabels(best_taut)
|
|
self.assertEqual(best_taut.GetAtomWithIdx(5).GetChiralTag(), Chem.ChiralType.CHI_TETRAHEDRAL_CW)
|
|
self.assertEqual(best_taut.GetAtomWithIdx(5).GetProp("_CIPCode"), "R")
|
|
self.assertEqual(Chem.MolToSmiles(best_taut), "CCCC(=O)[C@](C)(CC)C(C)=O")
|
|
|
|
# as reassignStereo is false, the CIP code has not been recomputed
|
|
# and therefore it is still S (incorrect)
|
|
params = rdMolStandardize.CleanupParameters()
|
|
params.tautomerReassignStereo = False
|
|
te = rdMolStandardize.TautomerEnumerator(params)
|
|
res = te.Enumerate(mol)
|
|
self.assertEqual(res.status, rdMolStandardize.TautomerEnumeratorStatus.Completed)
|
|
self.assertEqual(len(res.tautomers), 4)
|
|
best_taut = get_canonical_taut(res)
|
|
self.assertIsNotNone(best_taut)
|
|
self.assertEqual(best_taut.GetAtomWithIdx(5).GetChiralTag(), Chem.ChiralType.CHI_TETRAHEDRAL_CW)
|
|
self.assertEqual(best_taut.GetAtomWithIdx(5).GetProp("_CIPCode"), "S")
|
|
self.assertEqual(Chem.MolToSmiles(best_taut), "CCCC(=O)[C@](C)(CC)C(C)=O")
|
|
|
|
# as reassignStereo by default is true, the CIP code has been recomputed
|
|
# and therefore it is now R (correct)
|
|
params = rdMolStandardize.CleanupParameters()
|
|
params.tautomerRemoveSp3Stereo = False
|
|
te = rdMolStandardize.TautomerEnumerator(params)
|
|
res = te.Enumerate(mol)
|
|
self.assertEqual(res.status, rdMolStandardize.TautomerEnumeratorStatus.Completed)
|
|
self.assertEqual(len(res.tautomers), 4)
|
|
best_taut = get_canonical_taut(res)
|
|
rdCIPLabeler.AssignCIPLabels(best_taut)
|
|
self.assertIsNotNone(best_taut)
|
|
self.assertEqual(best_taut.GetAtomWithIdx(5).GetChiralTag(), Chem.ChiralType.CHI_TETRAHEDRAL_CW)
|
|
self.assertEqual(best_taut.GetAtomWithIdx(5).GetProp("_CIPCode"), "R")
|
|
self.assertEqual(Chem.MolToSmiles(best_taut), "CCCC(=O)[C@](C)(CC)C(C)=O")
|
|
|
|
# here the chirality stays even if the tautomerRemoveSp3Stereo parameter
|
|
# is set to false as the chiral center is not involved in tautomerism
|
|
# as reassignStereo is false, the CIP code has not been recomputed
|
|
# and therefore it is still S (incorrect)
|
|
params = rdMolStandardize.CleanupParameters()
|
|
params.tautomerRemoveSp3Stereo = False
|
|
params.tautomerReassignStereo = False
|
|
te = rdMolStandardize.TautomerEnumerator(params)
|
|
res = te.Enumerate(mol)
|
|
self.assertEqual(res.status, rdMolStandardize.TautomerEnumeratorStatus.Completed)
|
|
self.assertEqual(len(res.tautomers), 4)
|
|
best_taut = get_canonical_taut(res)
|
|
self.assertIsNotNone(best_taut)
|
|
self.assertEqual(best_taut.GetAtomWithIdx(5).GetChiralTag(), Chem.ChiralType.CHI_TETRAHEDRAL_CW)
|
|
self.assertEqual(best_taut.GetAtomWithIdx(5).GetProp("_CIPCode"), "S")
|
|
self.assertEqual(Chem.MolToSmiles(best_taut), "CCCC(=O)[C@](C)(CC)C(C)=O")
|
|
|
|
def test18TautomerEnumeratorResultIter(self):
|
|
smi = "Cc1nnc(NC(=O)N2CCN(Cc3ccc(F)cc3)C(=O)C2)s1"
|
|
mol = Chem.MolFromSmiles(smi)
|
|
self.assertIsNotNone(mol)
|
|
te = rdMolStandardize.TautomerEnumerator()
|
|
res = te.Enumerate(mol)
|
|
res_it = iter(res)
|
|
i = 0
|
|
while 1:
|
|
try:
|
|
t = next(res_it)
|
|
except StopIteration:
|
|
break
|
|
self.assertEqual(Chem.MolToSmiles(t), Chem.MolToSmiles(res[i]))
|
|
i += 1
|
|
self.assertEqual(i, len(res))
|
|
res_it = iter(res)
|
|
i = -len(res)
|
|
while 1:
|
|
try:
|
|
t = next(res_it)
|
|
except StopIteration:
|
|
break
|
|
self.assertEqual(Chem.MolToSmiles(t), Chem.MolToSmiles(res[i]))
|
|
i += 1
|
|
self.assertEqual(i, 0)
|
|
|
|
def test19NormalizeFromParams(self):
|
|
params = rdMolStandardize.CleanupParameters()
|
|
params.normalizationsFile = "ThisFileDoesNotExist.txt"
|
|
with self.assertRaises(OSError):
|
|
rdMolStandardize.NormalizerFromParams(params)
|
|
|
|
def test20NoneHandling(self):
|
|
with self.assertRaises(ValueError):
|
|
rdMolStandardize.ChargeParent(None)
|
|
with self.assertRaises(ValueError):
|
|
rdMolStandardize.Cleanup(None)
|
|
with self.assertRaises(ValueError):
|
|
rdMolStandardize.FragmentParent(None)
|
|
with self.assertRaises(ValueError):
|
|
rdMolStandardize.Normalize(None)
|
|
with self.assertRaises(ValueError):
|
|
rdMolStandardize.Reionize(None)
|
|
|
|
def test21UpdateFromJSON(self):
|
|
params = rdMolStandardize.CleanupParameters()
|
|
# note: these actual parameters aren't useful... they are for testing
|
|
rdMolStandardize.UpdateParamsFromJSON(
|
|
params, """{
|
|
"normalizationData":[
|
|
{"name":"silly 1","smarts":"[Cl:1]>>[F:1]"},
|
|
{"name":"silly 2","smarts":"[Br:1]>>[F:1]"}
|
|
],
|
|
"acidbaseData":[
|
|
{"name":"-CO2H","acid":"C(=O)[OH]","base":"C(=O)[O-]"},
|
|
{"name":"phenol","acid":"c[OH]","base":"c[O-]"}
|
|
],
|
|
"fragmentData":[
|
|
{"name":"hydrogen", "smarts":"[H]"},
|
|
{"name":"fluorine", "smarts":"[F]"},
|
|
{"name":"chlorine", "smarts":"[Cl]"}
|
|
],
|
|
"tautomerTransformData":[
|
|
{"name":"1,3 (thio)keto/enol f","smarts":"[CX4!H0]-[C]=[O,S,Se,Te;X1]","bonds":"","charges":""},
|
|
{"name":"1,3 (thio)keto/enol r","smarts":"[O,S,Se,Te;X2!H0]-[C]=[C]"}
|
|
]}""")
|
|
|
|
m = Chem.MolFromSmiles("CCC=O")
|
|
te = rdMolStandardize.TautomerEnumerator(params)
|
|
tauts = [Chem.MolToSmiles(x) for x in te.Enumerate(m)]
|
|
self.assertEqual(tauts, ["CC=CO", "CCC=O"])
|
|
self.assertEqual(Chem.MolToSmiles(rdMolStandardize.CanonicalTautomer(m, params)), "CCC=O")
|
|
# now with defaults
|
|
te = rdMolStandardize.TautomerEnumerator()
|
|
tauts = [Chem.MolToSmiles(x) for x in te.Enumerate(m)]
|
|
self.assertEqual(tauts, ["CC=CO", "CCC=O"])
|
|
self.assertEqual(Chem.MolToSmiles(rdMolStandardize.CanonicalTautomer(m)), "CCC=O")
|
|
|
|
m = Chem.MolFromSmiles('ClCCCBr')
|
|
nm = rdMolStandardize.Normalize(m, params)
|
|
self.assertEqual(Chem.MolToSmiles(nm), "FCCCF")
|
|
# now with defaults
|
|
nm = rdMolStandardize.Normalize(m)
|
|
self.assertEqual(Chem.MolToSmiles(nm), "ClCCCBr")
|
|
|
|
m = Chem.MolFromSmiles('c1cc([O-])cc(C(=O)O)c1')
|
|
nm = rdMolStandardize.Reionize(m, params)
|
|
self.assertEqual(Chem.MolToSmiles(nm), "O=C([O-])c1cccc(O)c1")
|
|
# now with defaults
|
|
nm = rdMolStandardize.Reionize(m)
|
|
self.assertEqual(Chem.MolToSmiles(nm), "O=C([O-])c1cccc(O)c1")
|
|
|
|
m = Chem.MolFromSmiles('C1=C(C=CC(=C1)[S]([O-])=O)[S](O)(=O)=O')
|
|
nm = rdMolStandardize.Reionize(m, params)
|
|
self.assertEqual(Chem.MolToSmiles(nm), "O=S([O-])c1ccc(S(=O)(=O)O)cc1")
|
|
# now with defaults
|
|
nm = rdMolStandardize.Reionize(m)
|
|
self.assertEqual(Chem.MolToSmiles(nm), "O=S(O)c1ccc(S(=O)(=O)[O-])cc1")
|
|
|
|
m = Chem.MolFromSmiles('[F-].[Cl-].[Br-].CC')
|
|
nm = rdMolStandardize.RemoveFragments(m, params)
|
|
self.assertEqual(Chem.MolToSmiles(nm), "CC.[Br-]")
|
|
# now with defaults
|
|
nm = rdMolStandardize.RemoveFragments(m)
|
|
self.assertEqual(Chem.MolToSmiles(nm), "CC")
|
|
|
|
def test22StandardizeInPlace(self):
|
|
m = Chem.MolFromSmiles("O=N(=O)-C(O[Fe])C(C(=O)O)C-N(=O)=O")
|
|
rdMolStandardize.CleanupInPlace(m)
|
|
self.assertEqual(Chem.MolToSmiles(m), "O=C([O-])C(C[N+](=O)[O-])C(O)[N+](=O)[O-].[Fe+]")
|
|
|
|
m = Chem.MolFromSmiles('[F-].[Cl-].[Br-].CC')
|
|
rdMolStandardize.RemoveFragmentsInPlace(m)
|
|
self.assertEqual(Chem.MolToSmiles(m), "CC")
|
|
|
|
m = Chem.MolFromSmiles('C1=C(C=CC(=C1)[S]([O-])=O)[S](O)(=O)=O')
|
|
rdMolStandardize.ReionizeInPlace(m)
|
|
self.assertEqual(Chem.MolToSmiles(m), "O=S(O)c1ccc(S(=O)(=O)[O-])cc1")
|
|
|
|
m = Chem.MolFromSmiles('CCO[Fe]')
|
|
rdMolStandardize.DisconnectOrganometallicsInPlace(m)
|
|
self.assertEqual(Chem.MolToSmiles(m), "CCO.[Fe]")
|
|
|
|
m = Chem.MolFromSmiles(r"C[N+](C)=C\C=C\[O-]")
|
|
rdMolStandardize.NormalizeInPlace(m)
|
|
self.assertEqual(Chem.MolToSmiles(m), "CN(C)C=CC=O")
|
|
|
|
def test23CleanupInPlaceMT(self):
|
|
ind = (("O=N(=O)-C(O[Fe])C(C(=O)O)C-N(=O)=O",
|
|
"O=C([O-])C(C[N+](=O)[O-])C(O)[N+](=O)[O-].[Fe+]"),
|
|
("O=N(=O)-CC(O[Fe])C(C(=O)O)C-N(=O)=O",
|
|
"O=C([O-])C(C[N+](=O)[O-])C(O)C[N+](=O)[O-].[Fe+]"),
|
|
("O=N(=O)-CCC(O[Fe])C(C(=O)O)C-N(=O)=O",
|
|
"O=C([O-])C(C[N+](=O)[O-])C(O)CC[N+](=O)[O-].[Fe+]"))
|
|
for i in range(4):
|
|
ind = ind + ind
|
|
ms = [Chem.MolFromSmiles(x) for x, y in ind]
|
|
rdMolStandardize.CleanupInPlace(ms, 4)
|
|
self.assertEqual([Chem.MolToSmiles(m) for m in ms], [y for x, y in ind])
|
|
|
|
def test24NormalizeInPlaceMT(self):
|
|
ind = (("O=N(=O)-CC-N(=O)=O", "O=[N+]([O-])CC[N+](=O)[O-]"),
|
|
("O=N(=O)-CCC-N(=O)=O", "O=[N+]([O-])CCC[N+](=O)[O-]"), ("O=N(=O)-CCCC-N(=O)=O",
|
|
"O=[N+]([O-])CCCC[N+](=O)[O-]"))
|
|
for i in range(4):
|
|
ind = ind + ind
|
|
ms = [Chem.MolFromSmiles(x) for x, y in ind]
|
|
rdMolStandardize.NormalizeInPlace(ms, 4)
|
|
self.assertEqual([Chem.MolToSmiles(m) for m in ms], [y for x, y in ind])
|
|
|
|
def test25ReionizeInPlaceMT(self):
|
|
ind = (("c1cc([O-])cc(C(=O)O)c1", "O=C([O-])c1cccc(O)c1"),
|
|
("c1cc(C[O-])cc(C(=O)O)c1", "O=C([O-])c1cccc(CO)c1"), ("c1cc(CC[O-])cc(C(=O)O)c1",
|
|
"O=C([O-])c1cccc(CCO)c1"))
|
|
for i in range(4):
|
|
ind = ind + ind
|
|
ms = [Chem.MolFromSmiles(x) for x, y in ind]
|
|
rdMolStandardize.ReionizeInPlace(ms, 4)
|
|
self.assertEqual([Chem.MolToSmiles(m) for m in ms], [y for x, y in ind])
|
|
|
|
def test26RemoveFragmentsInPlaceMT(self):
|
|
ind = (("CCCC.Cl.[Na]", "CCCC"), ("CCCCO.Cl.[Na]", "CCCCO"), ("CCOC.Cl.[Na]", "CCOC"))
|
|
for i in range(4):
|
|
ind = ind + ind
|
|
ms = [Chem.MolFromSmiles(x) for x, y in ind]
|
|
rdMolStandardize.RemoveFragmentsInPlace(ms, 4)
|
|
self.assertEqual([Chem.MolToSmiles(m) for m in ms], [y for x, y in ind])
|
|
|
|
def test27ChargeParentInPlaceMT(self):
|
|
ind = (("O=C([O-])c1ccccc1", "O=C(O)c1ccccc1"), ("CCCCO.Cl.[Na]", "CCCCO"),
|
|
("[N+](=O)([O-])[O-].[CH2]", "[CH2]"))
|
|
lfParams = rdMolStandardize.CleanupParameters()
|
|
lfParams.preferOrganic = True
|
|
for x, y in ind:
|
|
m2 = Chem.MolFromSmiles(x)
|
|
rdMolStandardize.ChargeParentInPlace(m2, lfParams)
|
|
self.assertEqual(Chem.MolToSmiles(m2), y)
|
|
|
|
ms = [Chem.MolFromSmiles(x) for x, y in ind]
|
|
for i in range(4):
|
|
ind = ind + ind
|
|
ms = [Chem.MolFromSmiles(x) for x, y in ind]
|
|
rdMolStandardize.ChargeParentInPlace(ms, 4, lfParams)
|
|
self.assertEqual([Chem.MolToSmiles(m) for m in ms], [y for x, y in ind])
|
|
|
|
def test28TautomerParentInPlaceMT(self):
|
|
ind = (("[O-]c1ccc(C(=O)O)cc1CC=CO", "O=CCCc1cc(C(=O)[O-])ccc1O"),
|
|
("[O-]c1ccc(C(=O)O)cc1CC=CO.[Na+]", "O=CCCc1cc(C(=O)[O-])ccc1O.[Na+]"),
|
|
("[O-]c1ccc(C(=O)O)cc1C[13CH]=CO", "O=C[13CH2]Cc1cc(C(=O)[O-])ccc1O"))
|
|
for x, y in ind:
|
|
m2 = Chem.MolFromSmiles(x)
|
|
rdMolStandardize.TautomerParentInPlace(m2)
|
|
self.assertEqual(Chem.MolToSmiles(m2), y)
|
|
|
|
ms = [Chem.MolFromSmiles(x) for x, y in ind]
|
|
for i in range(4):
|
|
ind = ind + ind
|
|
ms = [Chem.MolFromSmiles(x) for x, y in ind]
|
|
rdMolStandardize.TautomerParentInPlace(ms, 4)
|
|
self.assertEqual([Chem.MolToSmiles(m) for m in ms], [y for x, y in ind])
|
|
|
|
def test29StereoParentInPlaceMT(self):
|
|
ind = (("F[C@H](O)Cl", "OC(F)Cl"), ("F[C@H](CCO)Cl", "OCCC(F)Cl"), ("F[C@H](CCO)Cl.F[C@H](O)Cl",
|
|
"OC(F)Cl.OCCC(F)Cl"))
|
|
for x, y in ind:
|
|
m2 = Chem.MolFromSmiles(x)
|
|
rdMolStandardize.StereoParentInPlace(m2)
|
|
self.assertEqual(Chem.MolToSmiles(m2), y)
|
|
|
|
ms = [Chem.MolFromSmiles(x) for x, y in ind]
|
|
for i in range(4):
|
|
ind = ind + ind
|
|
ms = [Chem.MolFromSmiles(x) for x, y in ind]
|
|
rdMolStandardize.StereoParentInPlace(ms, 4)
|
|
self.assertEqual([Chem.MolToSmiles(m) for m in ms], [y for x, y in ind])
|
|
|
|
def test30FragmentParentInPlaceMT(self):
|
|
ind = (("O=C([O-])c1ccccc1", "O=C([O-])c1ccccc1"), ("CCCCO.Cl.[Na]", "CCCC[O-]"),
|
|
("[N+](=O)([O-])[O-].[CH2]", "[CH2]"))
|
|
lfParams = rdMolStandardize.CleanupParameters()
|
|
lfParams.preferOrganic = True
|
|
for x, y in ind:
|
|
m2 = Chem.MolFromSmiles(x)
|
|
rdMolStandardize.FragmentParentInPlace(m2, lfParams)
|
|
self.assertEqual(Chem.MolToSmiles(m2), y)
|
|
|
|
ms = [Chem.MolFromSmiles(x) for x, y in ind]
|
|
for i in range(4):
|
|
ind = ind + ind
|
|
ms = [Chem.MolFromSmiles(x) for x, y in ind]
|
|
rdMolStandardize.FragmentParentInPlace(ms, 4, lfParams)
|
|
self.assertEqual([Chem.MolToSmiles(m) for m in ms], [y for x, y in ind])
|
|
|
|
def test31IsotopeParentInPlaceMT(self):
|
|
ind = (("[13CH3]C", "CC"), ("[13CH3]C.C", "C.CC"), ("[13CH3][12CH3]", "CC"))
|
|
for x, y in ind:
|
|
m2 = Chem.MolFromSmiles(x)
|
|
rdMolStandardize.IsotopeParentInPlace(m2)
|
|
self.assertEqual(Chem.MolToSmiles(m2), y)
|
|
|
|
ms = [Chem.MolFromSmiles(x) for x, y in ind]
|
|
for i in range(4):
|
|
ind = ind + ind
|
|
ms = [Chem.MolFromSmiles(x) for x, y in ind]
|
|
rdMolStandardize.IsotopeParentInPlace(ms, 4)
|
|
self.assertEqual([Chem.MolToSmiles(m) for m in ms], [y for x, y in ind])
|
|
|
|
def test32SuperParentInPlaceMT(self):
|
|
ind = (("[O-]c1ccc(C(=O)O)cc1CC=CO", "O=CCCc1cc(C(=O)O)ccc1O"),
|
|
("[O-]c1ccc(C(=O)O)cc1CC=CO.[Na+]",
|
|
"O=CCCc1cc(C(=O)O)ccc1O"), ("[O-]c1ccc(C(=O)O)cc1C[13CH]=CO", "O=CCCc1cc(C(=O)O)ccc1O"))
|
|
for x, y in ind:
|
|
m2 = Chem.MolFromSmiles(x)
|
|
rdMolStandardize.SuperParentInPlace(m2)
|
|
self.assertEqual(Chem.MolToSmiles(m2), y)
|
|
|
|
ms = [Chem.MolFromSmiles(x) for x, y in ind]
|
|
for i in range(4):
|
|
ind = ind + ind
|
|
ms = [Chem.MolFromSmiles(x) for x, y in ind]
|
|
rdMolStandardize.SuperParentInPlace(ms, 4)
|
|
self.assertEqual([Chem.MolToSmiles(m) for m in ms], [y for x, y in ind])
|
|
|
|
def test33MolBlockValidation(self):
|
|
# featuresValidation
|
|
mol = Chem.MolFromMolBlock(
|
|
'''
|
|
Mrv2311 01162413552D
|
|
|
|
0 0 0 0 0 999 V3000
|
|
M V30 BEGIN CTAB
|
|
M V30 COUNTS 2 1 0 0 0
|
|
M V30 BEGIN ATOM
|
|
M V30 1 R# -17.3747 6.9367 0 0 RGROUPS=(1 0)
|
|
M V30 2 C -18.7083 6.1667 0 0
|
|
M V30 END ATOM
|
|
M V30 BEGIN BOND
|
|
M V30 1 1 2 1
|
|
M V30 END BOND
|
|
M V30 END CTAB
|
|
M END
|
|
''', sanitize=False)
|
|
|
|
validator = rdMolStandardize.FeaturesValidation()
|
|
errinfo = validator.validate(mol)
|
|
self.assertEqual(len(errinfo), 1)
|
|
self.assertEqual(errinfo[0], "ERROR: [FeaturesValidation] Query atom 0 is not allowed")
|
|
validator.allowDummies = True
|
|
validator.allowQueries = True
|
|
errinfo = validator.validate(mol)
|
|
self.assertEqual(len(errinfo), 0)
|
|
|
|
mol = Chem.MolFromMolBlock('''
|
|
Mrv2311 01162411552D
|
|
|
|
0 0 0 0 0 999 V3000
|
|
M V30 BEGIN CTAB
|
|
M V30 COUNTS 4 3 0 0 0
|
|
M V30 BEGIN ATOM
|
|
M V30 1 C -18.208 8.52 0 0 CFG=2
|
|
M V30 2 F -19.5417 7.75 0 0
|
|
M V30 3 C -16.8743 7.75 0 0
|
|
M V30 4 Cl -18.208 10.06 0 0
|
|
M V30 END ATOM
|
|
M V30 BEGIN BOND
|
|
M V30 1 1 1 3 CFG=1
|
|
M V30 2 1 2 1
|
|
M V30 3 1 1 4
|
|
M V30 END BOND
|
|
M V30 BEGIN COLLECTION
|
|
M V30 MDLV30/STERAC1 ATOMS=(1 1)
|
|
M V30 END COLLECTION
|
|
M V30 END CTAB
|
|
M END
|
|
''')
|
|
|
|
# enhanced stereo features are by default disallowed
|
|
validator = rdMolStandardize.FeaturesValidation()
|
|
errinfo = validator.validate(mol, True)
|
|
self.assertEqual(len(errinfo), 1)
|
|
self.assertEqual(
|
|
errinfo[0], "ERROR: [FeaturesValidation] Enhanced stereochemistry features are not allowed")
|
|
|
|
# allow enhanced stereo
|
|
validator = rdMolStandardize.FeaturesValidation(True)
|
|
errinfo = validator.validate(mol, True)
|
|
self.assertEqual(len(errinfo), 0)
|
|
validator.allowEnhancedStereo = True
|
|
errinfo = validator.validate(mol)
|
|
self.assertEqual(len(errinfo), 0)
|
|
|
|
mol = Chem.MolFromMolBlock(
|
|
'''
|
|
Mrv2311 02272411562D
|
|
|
|
0 0 0 0 0 999 V3000
|
|
M V30 BEGIN CTAB
|
|
M V30 COUNTS 7 7 0 0 0
|
|
M V30 BEGIN ATOM
|
|
M V30 1 C -10.3542 4.29 0 0
|
|
M V30 2 C -11.6879 3.52 0 0
|
|
M V30 3 C -11.6879 1.9798 0 0
|
|
M V30 4 N -10.3542 1.21 0 0
|
|
M V30 5 C -9.0204 1.9798 0 0
|
|
M V30 6 C -9.0204 3.52 0 0
|
|
M V30 7 C -10.3542 5.83 0 0
|
|
M V30 END ATOM
|
|
M V30 BEGIN BOND
|
|
M V30 1 4 1 2
|
|
M V30 2 4 1 6
|
|
M V30 3 4 2 3
|
|
M V30 4 4 5 6
|
|
M V30 5 1 1 7
|
|
M V30 6 4 3 4
|
|
M V30 7 4 4 5
|
|
M V30 END BOND
|
|
M V30 END CTAB
|
|
M END
|
|
''', sanitize=False)
|
|
|
|
# aromatic bonds are by default disallowed
|
|
validator = rdMolStandardize.FeaturesValidation()
|
|
errinfo = validator.validate(mol, True)
|
|
self.assertEqual(len(errinfo), 6)
|
|
self.assertEqual(errinfo[0],
|
|
"ERROR: [FeaturesValidation] Bond 0 of aromatic type is not allowed")
|
|
validator.allowAromaticBondType = True
|
|
errinfo = validator.validate(mol)
|
|
self.assertEqual(len(errinfo), 0)
|
|
|
|
# allow aromatic bonds
|
|
validator = rdMolStandardize.FeaturesValidation(False, True)
|
|
errinfo = validator.validate(mol, True)
|
|
self.assertEqual(len(errinfo), 0)
|
|
|
|
# disallowedRadicalValidation
|
|
mol = Chem.MolFromMolBlock(
|
|
'''
|
|
Mrv2311 02082417212D
|
|
|
|
0 0 0 0 0 999 V3000
|
|
M V30 BEGIN CTAB
|
|
M V30 COUNTS 2 1 0 0 0
|
|
M V30 BEGIN ATOM
|
|
M V30 1 C -20.9372 7.145 0 0 RAD=2
|
|
M V30 2 C -22.2708 6.375 0 0
|
|
M V30 END ATOM
|
|
M V30 BEGIN BOND
|
|
M V30 1 1 2 1
|
|
M V30 END BOND
|
|
M V30 END CTAB
|
|
M END
|
|
''', sanitize=False)
|
|
|
|
validator = rdMolStandardize.DisallowedRadicalValidation()
|
|
errinfo = validator.validate(mol)
|
|
self.assertEqual(len(errinfo), 1)
|
|
self.assertEqual(errinfo[0],
|
|
"ERROR: [DisallowedRadicalValidation] The radical at atom 0 is not allowed")
|
|
|
|
# is2DValidation
|
|
mol = Chem.MolFromMolBlock(
|
|
'''
|
|
2D
|
|
|
|
0 0 0 0 0 999 V3000
|
|
M V30 BEGIN CTAB
|
|
M V30 COUNTS 2 1 0 0 0
|
|
M V30 BEGIN ATOM
|
|
M V30 1 C 0.8753 4.9367 0 0
|
|
M V30 2 C -0.4583 4.1667 0 0
|
|
M V30 END ATOM
|
|
M V30 BEGIN BOND
|
|
M V30 1 1 2 1
|
|
M V30 END BOND
|
|
M V30 END CTAB
|
|
M END
|
|
''', sanitize=False)
|
|
|
|
validator = rdMolStandardize.Is2DValidation()
|
|
errinfo = validator.validate(mol)
|
|
self.assertEqual(len(errinfo), 0)
|
|
|
|
conf = mol.GetConformer()
|
|
pos = conf.GetAtomPosition(1)
|
|
self.assertEqual(pos.z, 0.0)
|
|
pos.z = 0.1
|
|
conf.SetAtomPosition(1, pos)
|
|
|
|
validator = rdMolStandardize.Is2DValidation()
|
|
errinfo = validator.validate(mol)
|
|
self.assertEqual(len(errinfo), 1)
|
|
self.assertEqual(errinfo[0],
|
|
"ERROR: [Is2DValidation] The molecule includes non-null Z coordinates")
|
|
|
|
validator = rdMolStandardize.Is2DValidation(0.2)
|
|
errinfo = validator.validate(mol)
|
|
self.assertEqual(len(errinfo), 0)
|
|
|
|
mol = Chem.MolFromMolBlock(
|
|
'''
|
|
2D
|
|
|
|
0 0 0 0 0 999 V3000
|
|
M V30 BEGIN CTAB
|
|
M V30 COUNTS 2 1 0 0 0
|
|
M V30 BEGIN ATOM
|
|
M V30 1 C 0.8753 4.9367 0 0
|
|
M V30 2 C -0.4583 4.1667 0.2 0
|
|
M V30 END ATOM
|
|
M V30 BEGIN BOND
|
|
M V30 1 1 2 1
|
|
M V30 END BOND
|
|
M V30 END CTAB
|
|
M END
|
|
''', sanitize=False)
|
|
validator = rdMolStandardize.Is2DValidation()
|
|
errinfo = validator.validate(mol)
|
|
self.assertEqual(len(errinfo), 1)
|
|
self.assertEqual(errinfo[0],
|
|
"ERROR: [Is2DValidation] The molecule includes non-null Z coordinates")
|
|
|
|
# AtomClashValidation
|
|
mol = Chem.MolFromMolBlock(
|
|
'''
|
|
2D
|
|
|
|
0 0 0 0 0 999 V3000
|
|
M V30 BEGIN CTAB
|
|
M V30 COUNTS 6 5 0 0 0
|
|
M V30 BEGIN ATOM
|
|
M V30 1 C -1.6667 6.2067 0 0
|
|
M V30 2 C -3.0004 5.4367 0 0
|
|
M V30 3 C -3.0004 3.8965 0 0
|
|
M V30 4 C -1.6667 3.1267 0 0
|
|
M V30 5 C -0.3329 4.6000 0 0
|
|
M V30 6 C -0.3329 4.7000 0 0
|
|
M V30 END ATOM
|
|
M V30 BEGIN BOND
|
|
M V30 1 1 1 2
|
|
M V30 2 1 1 6
|
|
M V30 3 1 2 3
|
|
M V30 4 1 3 4
|
|
M V30 5 1 4 5
|
|
M V30 END BOND
|
|
M V30 END CTAB
|
|
M END
|
|
''', sanitize=False)
|
|
|
|
validator = rdMolStandardize.Layout2DValidation()
|
|
errinfo = validator.validate(mol)
|
|
self.assertEqual(len(errinfo), 1)
|
|
self.assertEqual(errinfo[0], "ERROR: [Layout2DValidation] Atom 4 is too close to atom 5")
|
|
|
|
validator = rdMolStandardize.Layout2DValidation(1e-3)
|
|
errinfo = validator.validate(mol)
|
|
self.assertEqual(len(errinfo), 0)
|
|
|
|
mol = Chem.MolFromMolBlock(
|
|
'''
|
|
10052311582D
|
|
|
|
0 0 0 0 0 999 V3000
|
|
M V30 BEGIN CTAB
|
|
M V30 COUNTS 5 4 0 0 0
|
|
M V30 BEGIN ATOM
|
|
M V30 1 Br 0.0003 7.27 0 0
|
|
M V30 2 C -1.3333 6.5 0 0
|
|
M V30 3 F -2.667 7.27 0 0
|
|
M V30 4 O -1.3333 4.96 0 0
|
|
M V30 5 C 0.0003 5.73 0 0
|
|
M V30 END ATOM
|
|
M V30 BEGIN BOND
|
|
M V30 1 1 2 5 CFG=1
|
|
M V30 2 1 2 3 CFG=3
|
|
M V30 3 1 2 1
|
|
M V30 4 1 2 4
|
|
M V30 END BOND
|
|
M V30 END CTAB
|
|
M END
|
|
''', sanitize=False)
|
|
|
|
Chem.ReapplyMolBlockWedging(mol)
|
|
|
|
validator = rdMolStandardize.StereoValidation()
|
|
errinfo = validator.validate(mol)
|
|
self.assertEqual(len(errinfo), 1)
|
|
self.assertEqual(
|
|
errinfo[0],
|
|
"ERROR: [StereoValidation] Atom 1 has opposing stereo bonds with different up/down orientation"
|
|
)
|
|
|
|
def test24Pipeline(self):
|
|
pipeline = rdMolStandardize.Pipeline()
|
|
|
|
# invalid input molblock
|
|
molblock = '''
|
|
sldfj;ldskfj sldkjfsd;lkf
|
|
M V30 BEGIN CTAB
|
|
'''
|
|
result = pipeline.run(molblock)
|
|
self.assertEqual(result.stage, rdMolStandardize.PipelineStage.PARSING_INPUT)
|
|
self.assertNotEqual(result.status, rdMolStandardize.PipelineStatus.NO_EVENT)
|
|
self.assertTrue(result.status & rdMolStandardize.PipelineStatus.INPUT_ERROR)
|
|
|
|
# R group
|
|
molblock = '''
|
|
Mrv2311 01162413552D
|
|
|
|
0 0 0 0 0 999 V3000
|
|
M V30 BEGIN CTAB
|
|
M V30 COUNTS 2 1 0 0 0
|
|
M V30 BEGIN ATOM
|
|
M V30 1 R# -17.3747 6.9367 0 0 RGROUPS=(1 0)
|
|
M V30 2 C -18.7083 6.1667 0 0
|
|
M V30 END ATOM
|
|
M V30 BEGIN BOND
|
|
M V30 1 1 2 1
|
|
M V30 END BOND
|
|
M V30 END CTAB
|
|
M END
|
|
'''
|
|
result = pipeline.run(molblock)
|
|
self.assertEqual(result.stage, rdMolStandardize.PipelineStage.COMPLETED)
|
|
self.assertNotEqual(result.status, rdMolStandardize.PipelineStatus.NO_EVENT)
|
|
self.assertTrue(result.status & rdMolStandardize.PipelineStatus.VALIDATION_ERROR)
|
|
self.assertTrue(result.status & rdMolStandardize.PipelineStatus.FEATURES_VALIDATION_ERROR)
|
|
|
|
# no atoms
|
|
molblock = '''
|
|
10052313452D
|
|
|
|
0 0 0 0 0 999 V3000
|
|
M V30 BEGIN CTAB
|
|
M V30 COUNTS 0 0 0 0 0
|
|
M V30 END CTAB
|
|
M END
|
|
'''
|
|
result = pipeline.run(molblock)
|
|
self.assertEqual(result.stage, rdMolStandardize.PipelineStage.COMPLETED)
|
|
self.assertNotEqual(result.status, rdMolStandardize.PipelineStatus.NO_EVENT)
|
|
self.assertTrue(result.status & rdMolStandardize.PipelineStatus.VALIDATION_ERROR)
|
|
self.assertTrue(result.status & rdMolStandardize.PipelineStatus.BASIC_VALIDATION_ERROR)
|
|
|
|
# neutral quaternary N
|
|
molblock = '''
|
|
10242314442D
|
|
|
|
0 0 0 0 0 999 V3000
|
|
M V30 BEGIN CTAB
|
|
M V30 COUNTS 5 4 0 0 0
|
|
M V30 BEGIN ATOM
|
|
M V30 1 C -1.6247 7.5825 0 0
|
|
M V30 2 N -2.9583 6.8125 0 0
|
|
M V30 3 C -4.292 7.5825 0 0
|
|
M V30 4 C -2.9583 5.2725 0 0
|
|
M V30 5 C -1.6247 6.0425 0 0
|
|
M V30 END ATOM
|
|
M V30 BEGIN BOND
|
|
M V30 1 1 2 1
|
|
M V30 2 1 2 3
|
|
M V30 3 1 2 4
|
|
M V30 4 1 2 5
|
|
M V30 END BOND
|
|
M V30 END CTAB
|
|
M END
|
|
'''
|
|
result = pipeline.run(molblock)
|
|
self.assertEqual(result.stage, rdMolStandardize.PipelineStage.COMPLETED)
|
|
self.assertNotEqual(result.status, rdMolStandardize.PipelineStatus.NO_EVENT)
|
|
self.assertTrue(result.status & rdMolStandardize.PipelineStatus.VALIDATION_ERROR)
|
|
#self.assertTrue(result.status & rdMolStandardize.PipelineStatus.STANDARDIZATION_ERROR)
|
|
self.assertEqual(
|
|
result.status,
|
|
(
|
|
rdMolStandardize.PipelineStatus.BASIC_VALIDATION_ERROR
|
|
| rdMolStandardize.PipelineStatus.PREPARE_FOR_STANDARDIZATION_ERROR #|
|
|
#rdMolStandardize.PipelineStatus.NORMALIZER_STANDARDIZATION_ERROR
|
|
))
|
|
|
|
molblock = '''
|
|
2D
|
|
|
|
0 0 0 0 0 999 V3000
|
|
M V30 BEGIN CTAB
|
|
M V30 COUNTS 2 1 0 0 0
|
|
M V30 BEGIN ATOM
|
|
M V30 1 C 0.8753 4.9367 0 0
|
|
M V30 2 C -0.4583 4.1667 0.2 0
|
|
M V30 END ATOM
|
|
M V30 BEGIN BOND
|
|
M V30 1 1 2 1
|
|
M V30 END BOND
|
|
M V30 END CTAB
|
|
M END
|
|
'''
|
|
|
|
result = pipeline.run(molblock)
|
|
self.assertEqual(result.stage, rdMolStandardize.PipelineStage.COMPLETED)
|
|
self.assertNotEqual(result.status, rdMolStandardize.PipelineStatus.NO_EVENT)
|
|
self.assertTrue(result.status & rdMolStandardize.PipelineStatus.VALIDATION_ERROR)
|
|
self.assertTrue(result.status & rdMolStandardize.PipelineStatus.IS2D_VALIDATION_ERROR)
|
|
|
|
molblock = '''
|
|
2D
|
|
|
|
0 0 0 0 0 999 V3000
|
|
M V30 BEGIN CTAB
|
|
M V30 COUNTS 4 3 0 0 0
|
|
M V30 BEGIN ATOM
|
|
M V30 1 C -3.05 5.48 0 0
|
|
M V30 2 C -4.4167 4.6875 0 0
|
|
M V30 3 C -4.3289 6.3627 0 0
|
|
M V30 4 C -3.0 5.5 0 0
|
|
M V30 END ATOM
|
|
M V30 BEGIN BOND
|
|
M V30 1 1 2 1
|
|
M V30 2 1 1 3
|
|
M V30 3 1 3 4
|
|
M V30 END BOND
|
|
M V30 END CTAB
|
|
M END
|
|
'''
|
|
|
|
result = pipeline.run(molblock)
|
|
self.assertEqual(result.stage, rdMolStandardize.PipelineStage.COMPLETED)
|
|
self.assertNotEqual(result.status, rdMolStandardize.PipelineStatus.NO_EVENT)
|
|
self.assertTrue(result.status & rdMolStandardize.PipelineStatus.VALIDATION_ERROR)
|
|
self.assertTrue(result.status & rdMolStandardize.PipelineStatus.LAYOUT2D_VALIDATION_ERROR)
|
|
|
|
molblock = '''
|
|
2D
|
|
|
|
0 0 0 0 0 999 V3000
|
|
M V30 BEGIN CTAB
|
|
M V30 COUNTS 5 4 0 0 0
|
|
M V30 BEGIN ATOM
|
|
M V30 1 C -1.583 5.7075 0 0
|
|
M V30 2 C -2.9167 4.9375 0 0
|
|
M V30 3 C -1.583 7.2475 0 0
|
|
M V30 4 C -0.2493 4.9375 0.5 0
|
|
M V30 5 C -1.583 4.1675 0 0
|
|
M V30 END ATOM
|
|
M V30 BEGIN BOND
|
|
M V30 1 1 1 2 CFG=1
|
|
M V30 2 1 1 3 CFG=1
|
|
M V30 3 1 1 4
|
|
M V30 4 1 1 5
|
|
M V30 END BOND
|
|
M V30 END CTAB
|
|
M END
|
|
'''
|
|
|
|
result = pipeline.run(molblock)
|
|
self.assertEqual(result.stage, rdMolStandardize.PipelineStage.COMPLETED)
|
|
self.assertNotEqual(result.status, rdMolStandardize.PipelineStatus.NO_EVENT)
|
|
self.assertTrue(result.status & rdMolStandardize.PipelineStatus.VALIDATION_ERROR)
|
|
self.assertEqual(
|
|
result.status, rdMolStandardize.PipelineStatus.IS2D_VALIDATION_ERROR
|
|
| rdMolStandardize.PipelineStatus.STEREO_VALIDATION_ERROR)
|
|
|
|
molblock = '''
|
|
10282320572D
|
|
|
|
0 0 0 0 0 999 V3000
|
|
M V30 BEGIN CTAB
|
|
M V30 COUNTS 5 4 0 0 0
|
|
M V30 BEGIN ATOM
|
|
M V30 1 C -1.0413 5.4992 0 0
|
|
M V30 2 C -2.375 4.7292 0 0
|
|
M V30 3 O -1.0413 7.0392 0 0
|
|
M V30 4 O 0.2924 4.7292 0 0
|
|
M V30 5 Na 0.2924 3.1892 0 0
|
|
M V30 END ATOM
|
|
M V30 BEGIN BOND
|
|
M V30 1 1 2 1
|
|
M V30 2 1 1 4
|
|
M V30 3 2 1 3
|
|
M V30 4 1 4 5
|
|
M V30 END BOND
|
|
M V30 END CTAB
|
|
M END
|
|
'''
|
|
|
|
result = pipeline.run(molblock)
|
|
self.assertEqual(result.stage, rdMolStandardize.PipelineStage.COMPLETED)
|
|
self.assertEqual((result.status & rdMolStandardize.PipelineStatus.PIPELINE_ERROR),
|
|
rdMolStandardize.PipelineStatus.NO_EVENT)
|
|
self.assertNotEqual((result.status & rdMolStandardize.PipelineStatus.STRUCTURE_MODIFICATION),
|
|
rdMolStandardize.PipelineStatus.STRUCTURE_MODIFICATION)
|
|
self.assertEqual((result.status & rdMolStandardize.PipelineStatus.STRUCTURE_MODIFICATION),
|
|
(rdMolStandardize.PipelineStatus.METALS_DISCONNECTED
|
|
| rdMolStandardize.PipelineStatus.FRAGMENTS_REMOVED
|
|
| rdMolStandardize.PipelineStatus.PROTONATION_CHANGED))
|
|
|
|
parentMol = Chem.MolFromMolBlock(result.parentMolData, sanitize=False)
|
|
parentSmiles = Chem.MolToSmiles(parentMol)
|
|
self.assertEqual(parentSmiles, "CC(=O)O")
|
|
|
|
outputMol = Chem.MolFromMolBlock(result.outputMolData, sanitize=False)
|
|
outputSmiles = Chem.MolToSmiles(outputMol)
|
|
self.assertEqual(outputSmiles, "CC(=O)O")
|
|
|
|
molblock = '''
|
|
10282320572D
|
|
|
|
0 0 0 0 0 999 V3000
|
|
M V30 BEGIN CTAB
|
|
M V30 COUNTS 4 3 0 0 0
|
|
M V30 BEGIN ATOM
|
|
M V30 1 N -1.0413 5.4992 0 0
|
|
M V30 2 C -2.375 4.7292 0 0
|
|
M V30 3 O -1.0413 7.0392 0 0
|
|
M V30 4 O 0.2924 4.7292 0 0
|
|
M V30 END ATOM
|
|
M V30 BEGIN BOND
|
|
M V30 1 1 2 1
|
|
M V30 2 2 1 4
|
|
M V30 3 2 1 3
|
|
M V30 END BOND
|
|
M V30 END CTAB
|
|
M END
|
|
'''
|
|
|
|
result = pipeline.run(molblock)
|
|
self.assertEqual(result.stage, rdMolStandardize.PipelineStage.COMPLETED)
|
|
# nitro groups are cleaned-up in a pre-standardization step
|
|
self.assertEqual((result.status & rdMolStandardize.PipelineStatus.PIPELINE_ERROR),
|
|
rdMolStandardize.PipelineStatus.NO_EVENT)
|
|
self.assertEqual((result.status & rdMolStandardize.PipelineStatus.STRUCTURE_MODIFICATION),
|
|
rdMolStandardize.PipelineStatus.NO_EVENT)
|
|
|
|
parentMol = Chem.MolFromMolBlock(result.parentMolData, sanitize=False)
|
|
parentSmiles = Chem.MolToSmiles(parentMol)
|
|
self.assertEqual(parentSmiles, "C[N+](=O)[O-]")
|
|
|
|
outputMol = Chem.MolFromMolBlock(result.outputMolData, sanitize=False)
|
|
outputSmiles = Chem.MolToSmiles(outputMol)
|
|
self.assertEqual(outputSmiles, "C[N+](=O)[O-]")
|
|
|
|
molblock = '''
|
|
10282320572D
|
|
|
|
0 0 0 0 0 999 V3000
|
|
M V30 BEGIN CTAB
|
|
M V30 COUNTS 6 5 0 0 0
|
|
M V30 BEGIN ATOM
|
|
M V30 1 C -1.0413 5.4992 0 0
|
|
M V30 2 C -2.375 4.7292 0 0
|
|
M V30 3 O -1.0413 7.0392 0 0
|
|
M V30 4 O 0.2924 4.7292 0 0
|
|
M V30 5 N -3.7087 5.4992 0 0 CHG=1
|
|
M V30 6 Na 0.2924 3.1892 0 0
|
|
M V30 END ATOM
|
|
M V30 BEGIN BOND
|
|
M V30 1 1 2 1
|
|
M V30 2 1 1 4
|
|
M V30 3 2 1 3
|
|
M V30 4 1 2 5
|
|
M V30 5 1 4 6
|
|
M V30 END BOND
|
|
M V30 END CTAB
|
|
M END
|
|
'''
|
|
|
|
result = pipeline.run(molblock)
|
|
self.assertEqual(result.stage, rdMolStandardize.PipelineStage.COMPLETED)
|
|
self.assertEqual((result.status & rdMolStandardize.PipelineStatus.PIPELINE_ERROR),
|
|
rdMolStandardize.PipelineStatus.NO_EVENT)
|
|
self.assertNotEqual((result.status & rdMolStandardize.PipelineStatus.STRUCTURE_MODIFICATION),
|
|
rdMolStandardize.PipelineStatus.STRUCTURE_MODIFICATION)
|
|
self.assertEqual((result.status & rdMolStandardize.PipelineStatus.STRUCTURE_MODIFICATION),
|
|
(rdMolStandardize.PipelineStatus.METALS_DISCONNECTED
|
|
| rdMolStandardize.PipelineStatus.FRAGMENTS_REMOVED))
|
|
|
|
parentMol = Chem.MolFromMolBlock(result.parentMolData, sanitize=False)
|
|
parentSmiles = Chem.MolToSmiles(parentMol)
|
|
self.assertEqual(parentSmiles, "NCC(=O)O")
|
|
|
|
outputMol = Chem.MolFromMolBlock(result.outputMolData, sanitize=False)
|
|
outputSmiles = Chem.MolToSmiles(outputMol)
|
|
self.assertEqual(outputSmiles, "[NH3+]CC(=O)[O-]")
|
|
|
|
def test25PipelineNormalizerOptions(self):
|
|
options = rdMolStandardize.PipelineOptions()
|
|
# run the pipeline w/ the RDKit default normalizer transforms
|
|
options.normalizerData = ''
|
|
pipeline = rdMolStandardize.Pipeline(options)
|
|
|
|
molblock = '''
|
|
Mrv2311 02072415362D
|
|
|
|
0 0 0 0 0 999 V3000
|
|
M V30 BEGIN CTAB
|
|
M V30 COUNTS 4 3 0 0 0
|
|
M V30 BEGIN ATOM
|
|
M V30 1 S -10.3538 4.27 0 0
|
|
M V30 2 C -11.6875 3.5 0 0
|
|
M V30 3 O -10.3538 5.81 0 0
|
|
M V30 4 C -9.0201 3.5 0 0
|
|
M V30 END ATOM
|
|
M V30 BEGIN BOND
|
|
M V30 1 1 2 1
|
|
M V30 2 1 1 4
|
|
M V30 3 2 1 3
|
|
M V30 END BOND
|
|
M V30 END CTAB
|
|
M END
|
|
'''
|
|
result = pipeline.run(molblock)
|
|
self.assertEqual(result.stage, rdMolStandardize.PipelineStage.COMPLETED)
|
|
self.assertEqual((result.status & rdMolStandardize.PipelineStatus.PIPELINE_ERROR),
|
|
rdMolStandardize.PipelineStatus.NO_EVENT)
|
|
self.assertNotEqual((result.status & rdMolStandardize.PipelineStatus.STRUCTURE_MODIFICATION),
|
|
rdMolStandardize.PipelineStatus.STRUCTURE_MODIFICATION)
|
|
self.assertEqual((result.status & rdMolStandardize.PipelineStatus.STRUCTURE_MODIFICATION),
|
|
rdMolStandardize.PipelineStatus.NORMALIZATION_APPLIED)
|
|
|
|
outputMol = Chem.MolFromMolBlock(result.outputMolData, sanitize=False)
|
|
outputSmiles = Chem.MolToSmiles(outputMol)
|
|
self.assertEqual(outputSmiles, "C[S+](C)[O-]")
|
|
|
|
def test26PipelineAllowEmptyMoleculesOption(self):
|
|
options = rdMolStandardize.PipelineOptions()
|
|
options.allowEmptyMolecules = True
|
|
pipeline = rdMolStandardize.Pipeline(options)
|
|
|
|
# no atoms
|
|
molblock = '''
|
|
10052313452D
|
|
|
|
0 0 0 0 0 999 V3000
|
|
M V30 BEGIN CTAB
|
|
M V30 COUNTS 0 0 0 0 0
|
|
M V30 END CTAB
|
|
M END
|
|
'''
|
|
result = pipeline.run(molblock)
|
|
self.assertEqual(result.stage, rdMolStandardize.PipelineStage.COMPLETED)
|
|
self.assertEqual(result.status, rdMolStandardize.PipelineStatus.NO_EVENT)
|
|
|
|
def testCustomScoreFuncs(self):
|
|
smi = "CC\\C=C(/O)[C@@H](C)C(C)=O"
|
|
m = Chem.MolFromSmiles(smi)
|
|
self.assertEqual(rdMolStandardize.ScoreRings(m), 0)
|
|
self.assertEqual(rdMolStandardize.ScoreHeteroHs(m), 0)
|
|
self.assertEqual(rdMolStandardize.ScoreSubstructs(m), 6)
|
|
|
|
# check the default terms
|
|
terms = rdMolStandardize.GetDefaultTautomerScoreSubstructs()
|
|
for term, (name, smarts, score) in zip(terms, [["benzoquinone", "[#6]1([#6]=[#6][#6]([#6]=[#6]1)=,:[N,S,O])=,:[N,S,O]",
|
|
25],
|
|
["oxim", "[#6]=[N][OH]", 4],
|
|
["C=O", "[#6]=,:[#8]", 2],
|
|
["N=O", "[#7]=,:[#8]", 2],
|
|
["P=O", "[#15]=,:[#8]", 2],
|
|
["C=hetero", "[C]=[!#1;!#6]", 1],
|
|
["C(=hetero)-hetero", "[C](=[!#1;!#6])[!#1;!#6]", 2],
|
|
["aromatic C = exocyclic N", "[c]=!@[N]", -1],
|
|
["methyl", "[CX4H3]", 1],
|
|
["guanidine terminal=N", "[#7]C(=[NR0])[#7H0]", 1],
|
|
["guanidine endocyclic=N", "[#7;R][#6;R]([N])=[#7;R]", 2],
|
|
["aci-nitro", "[#6]=[N+]([O-])[OH]", -4]]):
|
|
self.assertEqual((term.name, term.smarts, term.score), (name, smarts, score))
|
|
|
|
# make sure we can pass in our own terms
|
|
terms = rdMolStandardize.SubstructTermVector()
|
|
terms.append(rdMolStandardize.SubstructTerm("C=0", "[#6]=,:[#8]", 1000))
|
|
self.assertEqual(rdMolStandardize.ScoreSubstructs(m, terms), 1000)
|
|
|
|
self.assertEqual(rdMolStandardize.ScoreSubstructs(
|
|
m, rdMolStandardize.GetDefaultTautomerScoreSubstructs()), 6)
|
|
|
|
enumerator = rdMolStandardize.TautomerEnumerator()
|
|
m2 = Chem.MolFromSmiles("C1(=CCCCC1)O")
|
|
|
|
ctaut = enumerator.Canonicalize(m2)
|
|
self.assertEqual(Chem.MolToSmiles(ctaut), "O=C1CCCCC1")
|
|
|
|
# duplicate the normal scoring function
|
|
def score_func1(mol):
|
|
return (rdMolStandardize.ScoreRings(mol) + rdMolStandardize.ScoreHeteroHs(mol) +
|
|
rdMolStandardize.ScoreSubstructs(mol))
|
|
|
|
ctaut = enumerator.Canonicalize(m2, score_func1)
|
|
self.assertEqual(Chem.MolToSmiles(ctaut), "O=C1CCCCC1")
|
|
|
|
# pull a single tautomer out of the mix
|
|
def score_func2(mol):
|
|
if Chem.MolToSmiles(mol) == Chem.CanonSmiles("C1(=CCCCC1)O"):
|
|
return 100_000
|
|
return 0
|
|
|
|
ctaut = enumerator.Canonicalize(m2, score_func2)
|
|
self.assertEqual(Chem.MolToSmiles(ctaut), Chem.CanonSmiles("C1(=CCCCC1)O"))
|
|
|
|
@unittest.skipUnless(inchi.INCHI_AVAILABLE, 'Inchi required')
|
|
def testTautomerCanonicalizeNoInchiBondStereoFrom2DCoords(self):
|
|
|
|
molblock = """
|
|
ChemDraw02102613032D
|
|
|
|
0 0 0 0 0 0 V3000
|
|
M V30 BEGIN CTAB
|
|
M V30 COUNTS 17 18 0 0 0
|
|
M V30 BEGIN ATOM
|
|
M V30 1 C -1.382449 2.541459 0.000000 0
|
|
M V30 2 N -1.391616 1.716459 0.000000 0
|
|
M V30 3 C -2.139845 1.368125 0.000000 0
|
|
M V30 4 C -2.333490 0.560313 0.000000 0
|
|
M V30 5 C -1.822449 -0.093386 0.000000 0
|
|
M V30 6 C -0.992865 -0.099688 0.000000 0
|
|
M V30 7 C -0.468073 0.540833 0.000000 0
|
|
M V30 8 C -0.646824 1.348646 0.000000 0
|
|
M V30 9 O 0.002291 1.857397 0.000000 0
|
|
M V30 10 N 0.334583 0.349479 0.000000 0
|
|
M V30 11 C 0.570052 -0.441719 0.000000 0
|
|
M V30 12 O 0.003437 -1.040990 0.000000 0
|
|
M V30 13 C 1.372708 -0.633073 0.000000 0
|
|
M V30 14 C 1.608177 -1.423699 0.000000 0
|
|
M V30 15 C 2.333490 -1.816147 0.000000 0
|
|
M V30 16 C 1.941043 -2.541459 0.000000 0
|
|
M V30 17 C 1.215730 -2.149012 0.000000 0
|
|
M V30 END ATOM
|
|
M V30 BEGIN BOND
|
|
M V30 1 1 1 2
|
|
M V30 2 1 2 8
|
|
M V30 3 1 2 3
|
|
M V30 4 1 3 4
|
|
M V30 5 1 4 5
|
|
M V30 6 1 5 6
|
|
M V30 7 1 6 7
|
|
M V30 8 1 7 8
|
|
M V30 9 2 8 9
|
|
M V30 10 1 7 10
|
|
M V30 11 1 10 11
|
|
M V30 12 2 11 12
|
|
M V30 13 1 11 13
|
|
M V30 14 2 13 14
|
|
M V30 15 1 14 17
|
|
M V30 16 1 14 15
|
|
M V30 17 1 15 16
|
|
M V30 18 1 16 17
|
|
M V30 END BOND
|
|
M V30 END CTAB
|
|
M END
|
|
"""
|
|
|
|
base = Chem.MolFromMolBlock(molblock, sanitize=True, removeHs=True)
|
|
self.assertIsNotNone(base)
|
|
self.assertEqual(base.GetNumConformers(), 1)
|
|
|
|
enumerator = rdMolStandardize.TautomerEnumerator()
|
|
canonical = enumerator.Canonicalize(base)
|
|
self.assertIsNotNone(canonical)
|
|
self.assertEqual(canonical.GetNumConformers(), 1)
|
|
|
|
before_inchi = inchi.MolToInchi(base)
|
|
after_inchi = inchi.MolToInchi(canonical)
|
|
|
|
self.assertNotIn("/b", before_inchi)
|
|
self.assertNotIn("/b", after_inchi)
|
|
|
|
def testCanonicalizeExocyclicDoubleBondRegression(self):
|
|
"""Regression: canonicalize() picks the wrong canonical tautomer for a
|
|
molecule whose exocyclic C=C stereo was set via the API (SetStereo +
|
|
SetStereoAtoms) without corresponding bond directions.
|
|
|
|
assignStereochemistry(force=true) inside enumerate() clears stereo
|
|
that lacks bond directions, so the resulting tautomer set differs
|
|
from what you get with SMILES-encoded E/Z (which carries bond
|
|
directions that survive re-perception). Among the no-stereo
|
|
tautomers the enumerator incorrectly chooses the endocyclic form
|
|
over the exocyclic one."""
|
|
mol = Chem.MolFromSmiles("O=C(CC1=CC2=CC=COC2)NC1=O")
|
|
mol = Chem.RemoveHs(mol)
|
|
|
|
# Pin unspecified exocyclic C=C bonds to STEREOTRANS via API
|
|
# (no bond directions set — stereo will be cleared by
|
|
# assignStereochemistry(force=true) during enumeration)
|
|
ranks = list(Chem.CanonicalRankAtoms(mol, breakTies=False))
|
|
for bond in mol.GetBonds():
|
|
if (bond.GetBondType() != Chem.rdchem.BondType.DOUBLE
|
|
or bond.GetStereo() != Chem.rdchem.BondStereo.STEREONONE
|
|
or bond.IsInRing()
|
|
or bond.GetBeginAtom().GetAtomicNum() != 6
|
|
or bond.GetEndAtom().GetAtomicNum() != 6):
|
|
continue
|
|
bgn, end = bond.GetBeginAtom(), bond.GetEndAtom()
|
|
bgnNbrs = [n.GetIdx() for n in bgn.GetNeighbors() if n.GetIdx() != end.GetIdx()]
|
|
endNbrs = [n.GetIdx() for n in end.GetNeighbors() if n.GetIdx() != bgn.GetIdx()]
|
|
if not bgnNbrs or not endNbrs:
|
|
continue
|
|
if (len(set(ranks[i] for i in bgnNbrs)) + bgn.GetNumImplicitHs() < 2
|
|
or len(set(ranks[i] for i in endNbrs)) + end.GetNumImplicitHs() < 2):
|
|
continue
|
|
bond.SetStereoAtoms(
|
|
min(bgnNbrs, key=lambda i: ranks[i]),
|
|
min(endNbrs, key=lambda i: ranks[i]),
|
|
)
|
|
bond.SetStereo(Chem.rdchem.BondStereo.STEREOTRANS)
|
|
|
|
params = rdMolStandardize.CleanupParameters()
|
|
params.tautomerRemoveBondStereo = False
|
|
params.tautomerRemoveSp3Stereo = False
|
|
enumerator = rdMolStandardize.TautomerEnumerator(params)
|
|
canon = enumerator.Canonicalize(mol)
|
|
smi = Chem.MolToSmiles(canon)
|
|
self.assertEqual(smi, "O=C1CC(=CC2=CC=COC2)C(=O)N1",
|
|
f"Expected exocyclic form, got: {smi}")
|
|
|
|
if __name__ == "__main__":
|
|
unittest.main()
|