mirror of
https://github.com/rdkit/rdkit.git
synced 2026-06-04 21:54:27 +08:00
* Speed up tautomer canonicalization by deferring on SSSR calc * Lazy kekulization for tautomer enumeration Defer kekulization of tautomers until they are actually needed for transform matching. This avoids creating kekulized copies for: 1. The initial tautomer (until first iteration) 2. New tautomers that may never be processed (if enumeration ends early) The Tautomer class now supports lazy initialization of the kekulized form via getKekulized() method. Performance improvement: ~7% additional speedup (total ~22-24% from baseline) * Use count-only substructure matching in tautomer scoring * Add SubstructMatchCount regression test * MolStandardize: reduce enumerate overhead * MolStandardize: avoid per-tautomer ring recomputation * Atom: cache PeriodicTable pointer in valence calcs * Atom: reuse PeriodicTable in getEffectiveAtomicNum * PeriodicTable: add atomic fast path for getTable * GraphMol: reduce ROMol copy reallocations * MolStandardize: use quickCopy for per-match product copies Use RWMol(*kmol, true) in tautomer enumeration to avoid copying properties/bookmarks/conformers for each candidate. This reduces deep-copy overhead without changing chemistry. * MolStandardize: pre-filter scoring patterns by element/connectivity For tautomer scoring, pre-compute which SubstructTerms are relevant for a given input molecule. Since tautomerization only moves H atoms and changes bond orders (never creates/destroys heavy-atom bonds), patterns requiring missing elements or connectivity can be skipped for all tautomers of that molecule. Two-stage filtering: 1. Element check: skip patterns requiring atoms not in the molecule 2. Connectivity check: skip patterns whose bond-order-agnostic structure doesn't match the input molecule's connectivity This reduces the number of VF2 substructure calls per tautomer from 12 to typically 3-5, depending on the molecule's composition. * MolStandardize: preserve molecule properties for canonical tautomer Copy molecule properties from the original input to the canonical tautomer result. Since quickCopy during enumeration skips d_props to avoid overhead, extended SMILES data like link nodes (LN) was lost. This restores them on the final result. * TautomerQuery: preserve molecule properties (e.g. link nodes) in tautomers TautomerQuery::fromMol() uses TautomerEnumerator::enumerate() which uses quickCopy for performance. This doesn't copy molecule properties like _molLinkNodes. Without this fix, XQMol output would lose link node extensions in the SMILES. Copy properties from the original query molecule to all enumerated tautomers before constructing the TautomerQuery. This preserves extended SMILES data without impacting enumeration performance. * MolStandardize: use parallel iteration and cache bond lookups Replace O(n) getAtomWithIdx/getBondWithIdx calls with parallel iteration over atom/bond ranges in canonicalizeInPlace and enumerate. Cache bond lookups in setTautomerStereoAndIsoHs to avoid repeated O(n) searches. * perf: add specialized matchers for simple tautomer scoring patterns Replace VF2 graph matching with O(n) loops for 6 simple patterns: - countDoubleOrAromaticBonds: C=O, N=O, P=O patterns - countMethyls: [CX4H3] methyl groups - countCarbonDoubleHetero: [C]=[/home/dcvuser/rdkit;Code/GraphMol/MolStandardize/Tautomer.h] aliphatic C=hetero - countAromaticCarbonExocyclicN: [c]=aromatic C=exocyclic N Complex patterns (benzoquinone, oxim, guanidine, aci-nitro) still use VF2. Combined with the pre-filtering optimization, this achieves ~3.7x speedup (~2500ms vs ~9300ms original) for tautomer canonicalization. * Fix tautomer canonicalize dropping conformers from quickCopy quickCopy (RWMol(*mol, true)) skips conformers, so tautomer enumeration products lose 2D/3D coordinates. This causes InChI generation to omit the /b (double bond E/Z stereo) layer, since E/Z is derived from atomic coordinates. Fix: copy conformers from the original molecule onto the canonical tautomer after pickCanonical in TautomerEnumerator::canonicalize(). Tests: SMILES-based E/Z check in testTautomer.cpp, molblock-based conformer preservation check in catch_tests.cpp. * add test on canonicalize losing stereo * add regression test for exocyclic C=C tautomer canonicalization The getTautomerStateKey() pre-filter (commit 2595ef748) can falsely deduplicate distinct tautomers when their atom-index-ordered state patterns happen to match, leading canonicalize() to pick the wrong canonical form for molecules with STEREOTRANS-pinned exocyclic C=C bonds after RemoveHs. Test verifies that O=C(CC1=CC2=CC=COC2)NC1=O canonicalizes to the exocyclic form O=C1CC(=CC2=CC=COC2)C(=O)N1, not the endocyclic form O=C1C=C(C=C2CC=COC2)C(=O)N1. Currently expected to FAIL until the state key dedup bug is fixed. * MolStandardize: expand tautomer connectivity SMARTS * MolStandardize: scope tautomer pattern enum * MolStandardize: trim tautomer pattern enum * MolStandardize: use symmetric ring scoring
1837 lines
59 KiB
C++
1837 lines
59 KiB
C++
//
|
|
// Copyright (C) 2019-2021 Greg Landrum
|
|
//
|
|
// @@ All Rights Reserved @@
|
|
// This file is part of the RDKit.
|
|
// The contents are covered by the terms of the BSD license
|
|
// which is included in the file license.txt, found at the root
|
|
// of the RDKit source tree.
|
|
//
|
|
#include <catch2/catch_all.hpp>
|
|
|
|
#include <GraphMol/RDKitBase.h>
|
|
#include <GraphMol/ROMol.h>
|
|
#include <GraphMol/SmilesParse/SmilesParse.h>
|
|
#include <GraphMol/SmilesParse/SmilesWrite.h>
|
|
#include <GraphMol/FileParsers/FileParsers.h>
|
|
#include <GraphMol/MolStandardize/MolStandardize.h>
|
|
#include <GraphMol/MolStandardize/Normalize.h>
|
|
#include <GraphMol/MolStandardize/Fragment.h>
|
|
#include <GraphMol/MolStandardize/Charge.h>
|
|
#include <GraphMol/MolStandardize/Tautomer.h>
|
|
#include <GraphMol/MolStandardize/Validate.h>
|
|
|
|
#include <fstream>
|
|
|
|
using namespace RDKit;
|
|
|
|
TEST_CASE("SKIP_IF_ALL_MATCH") {
|
|
auto m = "[Na+].[Cl-]"_smiles;
|
|
REQUIRE(m);
|
|
|
|
SECTION("default") {
|
|
MolStandardize::FragmentRemover fragRemover;
|
|
std::unique_ptr<ROMol> outm(fragRemover.remove(*m));
|
|
REQUIRE(outm);
|
|
CHECK(MolToSmiles(*outm) == "[Na+]");
|
|
}
|
|
SECTION("don't remove all") {
|
|
MolStandardize::FragmentRemover fragRemover("", true, true);
|
|
std::unique_ptr<ROMol> outm(fragRemover.remove(*m));
|
|
REQUIRE(outm);
|
|
CHECK(MolToSmiles(*outm) == "[Cl-].[Na+]");
|
|
}
|
|
SECTION("feel free to remove everything") {
|
|
MolStandardize::FragmentRemover fragRemover("", false, false);
|
|
std::unique_ptr<ROMol> outm(fragRemover.remove(*m));
|
|
REQUIRE(outm);
|
|
CHECK(outm->getNumAtoms() == 0);
|
|
}
|
|
SECTION("don't remove all 2") {
|
|
MolStandardize::FragmentRemover fragRemover("", true, true);
|
|
auto m = "[Na+].[Cl-].[Na+].[Cl-]"_smiles;
|
|
REQUIRE(m);
|
|
std::unique_ptr<ROMol> outm(fragRemover.remove(*m));
|
|
REQUIRE(outm);
|
|
CHECK(MolToSmiles(*outm) == "[Cl-].[Cl-].[Na+].[Na+]");
|
|
}
|
|
}
|
|
|
|
TEST_CASE("symmetry in the uncharger", "[uncharger]") {
|
|
SECTION("case 1") {
|
|
auto m = "C[N+](C)(C)CC(C(=O)[O-])CC(=O)[O-]"_smiles;
|
|
REQUIRE(m);
|
|
{
|
|
bool canonicalOrdering = false;
|
|
MolStandardize::Uncharger uncharger(canonicalOrdering);
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
REQUIRE(outm);
|
|
CHECK(MolToSmiles(*outm) == "C[N+](C)(C)CC(CC(=O)[O-])C(=O)O");
|
|
}
|
|
{
|
|
bool canonicalOrdering = true;
|
|
MolStandardize::Uncharger uncharger(canonicalOrdering);
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
REQUIRE(outm);
|
|
CHECK(MolToSmiles(*outm) == "C[N+](C)(C)CC(CC(=O)O)C(=O)[O-]");
|
|
}
|
|
{
|
|
MolStandardize::CleanupParameters params;
|
|
std::unique_ptr<ROMol> outm(MolStandardize::chargeParent(*m, params));
|
|
REQUIRE(outm);
|
|
CHECK(MolToSmiles(*outm) == "C[N+](C)(C)CC(CC(=O)O)C(=O)[O-]");
|
|
}
|
|
{
|
|
MolStandardize::CleanupParameters params;
|
|
params.doCanonical = false;
|
|
std::unique_ptr<ROMol> outm(MolStandardize::chargeParent(*m, params));
|
|
REQUIRE(outm);
|
|
CHECK(MolToSmiles(*outm) == "C[N+](C)(C)CC(CC(=O)[O-])C(=O)O");
|
|
}
|
|
}
|
|
}
|
|
|
|
TEST_CASE("uncharger 'force' option") {
|
|
SECTION("force=false (default)") {
|
|
MolStandardize::Uncharger uncharger;
|
|
auto m1 = "C[N+](C)(C)CC([O-])C[O-]"_smiles;
|
|
REQUIRE(m1);
|
|
std::unique_ptr<ROMol> outm1(uncharger.uncharge(*m1));
|
|
REQUIRE(outm1);
|
|
CHECK(MolToSmiles(*outm1) == "C[N+](C)(C)CC([O-])CO");
|
|
auto m2 = "C[B-](C)(C)CC([NH3+])C[NH3+]"_smiles;
|
|
REQUIRE(m2);
|
|
std::unique_ptr<ROMol> outm2(uncharger.uncharge(*m2));
|
|
REQUIRE(outm2);
|
|
CHECK(MolToSmiles(*outm2) == "C[B-](C)(C)CC(N)C[NH3+]");
|
|
}
|
|
SECTION("force=true") {
|
|
MolStandardize::Uncharger uncharger(false, true);
|
|
auto m1 = "C[N+](C)(C)CC([O-])C[O-]"_smiles;
|
|
REQUIRE(m1);
|
|
std::unique_ptr<ROMol> outm1(uncharger.uncharge(*m1));
|
|
REQUIRE(outm1);
|
|
CHECK(MolToSmiles(*outm1) == "C[N+](C)(C)CC(O)CO");
|
|
auto m2 = "C[B-](C)(C)CC([NH3+])C[NH3+]"_smiles;
|
|
REQUIRE(m2);
|
|
std::unique_ptr<ROMol> outm2(uncharger.uncharge(*m2));
|
|
REQUIRE(outm2);
|
|
CHECK(MolToSmiles(*outm2) == "C[B-](C)(C)CC(N)CN");
|
|
}
|
|
SECTION("force=true doesn't alter nitro groups") {
|
|
auto m = "CCC[N+](=O)[O-]"_smiles;
|
|
REQUIRE(m);
|
|
MolStandardize::Uncharger uncharger(false, true);
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
REQUIRE(outm);
|
|
CHECK(MolToSmiles(*outm) == "CCC[N+](=O)[O-]");
|
|
}
|
|
SECTION("force=true doesn't alter n-oxides") {
|
|
auto m = "[O-][n+]1ccccc1"_smiles;
|
|
REQUIRE(m);
|
|
MolStandardize::Uncharger uncharger(false, true);
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
REQUIRE(outm);
|
|
CHECK(MolToSmiles(*outm) == "[O-][n+]1ccccc1");
|
|
}
|
|
SECTION("tetramethylammonium acetate (force=false)") {
|
|
auto m = "C[N+](C)(C)C.CC(=O)[O-]"_smiles;
|
|
REQUIRE(m);
|
|
MolStandardize::Uncharger uncharger(true, false);
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
REQUIRE(outm);
|
|
CHECK(MolToSmiles(*outm) == "CC(=O)[O-].C[N+](C)(C)C");
|
|
}
|
|
SECTION("tetramethylammonium acetate (force=true)") {
|
|
auto m = "C[N+](C)(C)C.CC(=O)[O-]"_smiles;
|
|
REQUIRE(m);
|
|
MolStandardize::Uncharger uncharger(true, true);
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
REQUIRE(outm);
|
|
CHECK(MolToSmiles(*outm) == "CC(=O)O.C[N+](C)(C)C");
|
|
}
|
|
SECTION("tetramethylammonium nitrate (force=false)") {
|
|
auto m = "C[N+](C)(C)C.O=[N+]([O-])[O-]"_smiles;
|
|
REQUIRE(m);
|
|
MolStandardize::Uncharger uncharger(true, false);
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
REQUIRE(outm);
|
|
CHECK(MolToSmiles(*outm) == "C[N+](C)(C)C.O=[N+]([O-])[O-]");
|
|
}
|
|
SECTION("tetramethylammonium nitrate (force=true)") {
|
|
auto m = "C[N+](C)(C)C.O=[N+]([O-])[O-]"_smiles;
|
|
REQUIRE(m);
|
|
MolStandardize::Uncharger uncharger(true, true);
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
REQUIRE(outm);
|
|
CHECK(MolToSmiles(*outm) == "C[N+](C)(C)C.O=[N+]([O-])O");
|
|
}
|
|
SECTION("bookkeeping (force=false)") {
|
|
auto m = "O=[N+]([O-])[O-].O=[N+]([O-])[O-]"_smiles;
|
|
REQUIRE(m);
|
|
MolStandardize::Uncharger uncharger(true, false);
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
REQUIRE(outm);
|
|
CHECK(MolToSmiles(*outm) == "O=[N+]([O-])O.O=[N+]([O-])O");
|
|
}
|
|
SECTION("bookkeeping (force=true)") {
|
|
auto m = "O=[N+]([O-])[O-].O=[N+]([O-])[O-]"_smiles;
|
|
REQUIRE(m);
|
|
MolStandardize::Uncharger uncharger(true, true);
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
REQUIRE(outm);
|
|
CHECK(MolToSmiles(*outm) == "O=[N+]([O-])O.O=[N+]([O-])O");
|
|
}
|
|
}
|
|
|
|
TEST_CASE("uncharger bug with duplicates", "[uncharger]") {
|
|
SECTION("case 1") {
|
|
auto m = "[NH3+]CC([O-])C[O-]"_smiles;
|
|
REQUIRE(m);
|
|
MolStandardize::Uncharger uncharger;
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
REQUIRE(outm);
|
|
CHECK(MolToSmiles(*outm) == "NCC(O)CO");
|
|
}
|
|
SECTION("case 2") {
|
|
auto m = "CC([O-])C[O-].[Na+]"_smiles;
|
|
REQUIRE(m);
|
|
MolStandardize::Uncharger uncharger;
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
REQUIRE(outm);
|
|
CHECK(MolToSmiles(*outm) == "CC([O-])CO.[Na+]");
|
|
}
|
|
SECTION("acids + others 1, github #2392") {
|
|
auto m = "C[N+](C)(C)CC(C[O-])CC(=O)[O-]"_smiles;
|
|
REQUIRE(m);
|
|
bool doCanonical = false;
|
|
MolStandardize::Uncharger uncharger(doCanonical);
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
REQUIRE(outm);
|
|
CHECK(MolToSmiles(*outm) == "C[N+](C)(C)CC(CO)CC(=O)[O-]");
|
|
}
|
|
SECTION("acids + others 2, github #2392") {
|
|
auto m = "C[N+](C)(C)CC(CC(=O)[O-])C[O-]"_smiles;
|
|
REQUIRE(m);
|
|
bool doCanonical = false;
|
|
MolStandardize::Uncharger uncharger(doCanonical);
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
REQUIRE(outm);
|
|
CHECK(MolToSmiles(*outm) == "C[N+](C)(C)CC(CO)CC(=O)[O-]");
|
|
}
|
|
}
|
|
|
|
TEST_CASE(
|
|
"github #2411: MolStandardize: FragmentRemover should not sanitize "
|
|
"[fragments]") {
|
|
SECTION("demo") {
|
|
std::string smi = "CN(C)(C)C.Cl";
|
|
bool debugParse = false;
|
|
bool sanitize = false;
|
|
std::unique_ptr<ROMol> m(SmilesToMol(smi, debugParse, sanitize));
|
|
REQUIRE(m);
|
|
|
|
MolStandardize::FragmentRemover fragRemover;
|
|
std::unique_ptr<ROMol> outm(fragRemover.remove(*m));
|
|
REQUIRE(outm);
|
|
CHECK(MolToSmiles(*outm) == "CN(C)(C)C");
|
|
}
|
|
}
|
|
|
|
TEST_CASE(
|
|
"github #2452: incorrectly removing charge from boron anions"
|
|
"[fragments][uncharger]") {
|
|
SECTION("demo") {
|
|
auto m = "C[B-](C)(C)C"_smiles;
|
|
REQUIRE(m);
|
|
bool canonicalOrdering = true;
|
|
|
|
MolStandardize::Uncharger uncharger(canonicalOrdering);
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
REQUIRE(outm);
|
|
CHECK(outm->getAtomWithIdx(1)->getFormalCharge() == -1);
|
|
CHECK(MolToSmiles(*outm) == "C[B-](C)(C)C");
|
|
}
|
|
SECTION("should be removed") {
|
|
auto m = "C[BH-](C)(C)"_smiles;
|
|
REQUIRE(m);
|
|
bool canonicalOrdering = true;
|
|
|
|
MolStandardize::Uncharger uncharger(canonicalOrdering);
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
REQUIRE(outm);
|
|
CHECK(outm->getAtomWithIdx(1)->getFormalCharge() == 0);
|
|
CHECK(MolToSmiles(*outm) == "CB(C)C");
|
|
}
|
|
}
|
|
|
|
TEST_CASE("github #2602: Uncharger ignores dications", "[uncharger]") {
|
|
SECTION("demo") {
|
|
auto m = "[O-]CCC[O-].[Ca+2]"_smiles;
|
|
REQUIRE(m);
|
|
bool canonicalOrdering = true;
|
|
MolStandardize::Uncharger uncharger(canonicalOrdering);
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
REQUIRE(outm);
|
|
CHECK(outm->getAtomWithIdx(5)->getFormalCharge() == 2);
|
|
CHECK(outm->getAtomWithIdx(0)->getFormalCharge() == -1);
|
|
CHECK(outm->getAtomWithIdx(4)->getFormalCharge() == -1);
|
|
CHECK(MolToSmiles(*outm) == "[Ca+2].[O-]CCC[O-]");
|
|
}
|
|
}
|
|
|
|
TEST_CASE(
|
|
"github #2605: Uncharger incorrectly neutralizes cations when "
|
|
"non-neutralizable anions are present.",
|
|
"[uncharger]") {
|
|
SECTION("demo") {
|
|
auto m = "F[B-](F)(F)F.[NH3+]CCC"_smiles;
|
|
REQUIRE(m);
|
|
bool canonicalOrdering = true;
|
|
MolStandardize::Uncharger uncharger(canonicalOrdering);
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
REQUIRE(outm);
|
|
CHECK(outm->getAtomWithIdx(1)->getFormalCharge() == -1);
|
|
CHECK(outm->getAtomWithIdx(5)->getFormalCharge() == 1);
|
|
CHECK(MolToSmiles(*outm) == "CCC[NH3+].F[B-](F)(F)F");
|
|
}
|
|
SECTION("multiple positively charged sites") {
|
|
auto m = "F[B-](F)(F)F.[NH3+]CC=C[NH3+]"_smiles;
|
|
REQUIRE(m);
|
|
bool canonicalOrdering = true;
|
|
MolStandardize::Uncharger uncharger(canonicalOrdering);
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
REQUIRE(outm);
|
|
CHECK(outm->getAtomWithIdx(1)->getFormalCharge() == -1);
|
|
CHECK(outm->getAtomWithIdx(5)->getFormalCharge() == 0);
|
|
CHECK(outm->getAtomWithIdx(9)->getFormalCharge() == 1);
|
|
CHECK(MolToSmiles(*outm) == "F[B-](F)(F)F.NCC=C[NH3+]");
|
|
}
|
|
SECTION("make sure we don't go too far") {
|
|
v2::SmilesParse::SmilesParserParams ps;
|
|
ps.sanitize = false;
|
|
auto m = v2::SmilesParse::MolFromSmiles("F[B-](F)(F)F.[NH4+2]CCC",
|
|
ps); // totally bogus structure
|
|
REQUIRE(m);
|
|
bool canonicalOrdering = true;
|
|
MolStandardize::Uncharger uncharger(canonicalOrdering);
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
REQUIRE(outm);
|
|
CHECK(outm->getAtomWithIdx(1)->getFormalCharge() == -1);
|
|
CHECK(outm->getAtomWithIdx(5)->getFormalCharge() == 1);
|
|
CHECK(MolToSmiles(*outm) == "CCC[NH3+].F[B-](F)(F)F");
|
|
}
|
|
}
|
|
|
|
TEST_CASE("github #2610: Uncharger incorrectly modifying a zwitterion.",
|
|
"[uncharger]") {
|
|
SECTION("demo") {
|
|
auto m = "C1=CC=CC[NH+]1-[O-]"_smiles;
|
|
REQUIRE(m);
|
|
bool canonicalOrdering = true;
|
|
MolStandardize::Uncharger uncharger(canonicalOrdering);
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
REQUIRE(outm);
|
|
CHECK(outm->getAtomWithIdx(5)->getFormalCharge() == 1);
|
|
CHECK(outm->getAtomWithIdx(6)->getFormalCharge() == -1);
|
|
CHECK(MolToSmiles(*outm) == "[O-][NH+]1C=CC=CC1");
|
|
}
|
|
SECTION("zwitterion also including an N-oxide") {
|
|
auto m = "C[N+](C)(C)C(C(=O)[O-])c1cc[n+]([O-])cc1"_smiles;
|
|
REQUIRE(m);
|
|
bool canonicalOrdering = true;
|
|
MolStandardize::Uncharger uncharger(canonicalOrdering);
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
REQUIRE(outm);
|
|
CHECK(MolToSmiles(*outm) == "C[N+](C)(C)C(C(=O)[O-])c1cc[n+]([O-])cc1");
|
|
}
|
|
}
|
|
|
|
TEST_CASE("problems with ringInfo initialization", "[normalizer]") {
|
|
std::string tfs =
|
|
R"TXT(Bad amide tautomer1 [C:1]([OH1;D1:2])=;!@[NH1:3]>>[C:1](=[OH0:2])-[NH2:3]
|
|
Bad amide tautomer2 [C:1]([OH1;D1:2])=;!@[NH0:3]>>[C:1](=[OH0:2])-[NH1:3])TXT";
|
|
std::stringstream iss(tfs);
|
|
MolStandardize::Normalizer nrml(iss, 20);
|
|
SECTION("example1") {
|
|
auto m = "Cl.Cl.OC(=N)NCCCCCCCCCCCCNC(O)=N"_smiles;
|
|
REQUIRE(m);
|
|
std::unique_ptr<ROMol> res(nrml.normalize(*m));
|
|
REQUIRE(res);
|
|
CHECK(MolToSmiles(*res) == "Cl.Cl.NC(=O)NCCCCCCCCCCCCNC(N)=O");
|
|
}
|
|
}
|
|
|
|
TEST_CASE("segfault in normalizer", "[normalizer]") {
|
|
std::string tfs =
|
|
R"TXT(Bad amide tautomer1 [C:1]([OH1;D1:2])=;!@[NH1:3]>>[C:1](=[OH0:2])-[NH2:3]
|
|
Bad amide tautomer2 [C:1]([OH1;D1:2])=;!@[NH0:3]>>[C:1](=[OH0:2])-[NH1:3])TXT";
|
|
std::stringstream iss(tfs);
|
|
MolStandardize::Normalizer nrml(iss, 20);
|
|
SECTION("example1") {
|
|
std::string molblock = R"CTAB(molblock = """
|
|
SciTegic12221702182D
|
|
|
|
47 51 0 0 0 0 999 V2000
|
|
0.2962 6.2611 0.0000 C 0 0
|
|
-3.9004 4.4820 0.0000 C 0 0
|
|
1.4195 5.2670 0.0000 C 0 0
|
|
-3.8201 -7.4431 0.0000 C 0 0
|
|
-4.9433 -6.4490 0.0000 C 0 0
|
|
-2.3975 -6.9674 0.0000 C 0 0
|
|
3.5921 -3.5947 0.0000 C 0 0
|
|
-3.1475 2.3700 0.0000 C 0 0
|
|
2.1695 -4.0705 0.0000 C 0 0
|
|
-2.0242 1.3759 0.0000 C 0 0
|
|
-4.6440 -4.9792 0.0000 C 0 0
|
|
2.7681 -1.1308 0.0000 C 0 0
|
|
-5.8626 1.1332 0.0000 C 0 0
|
|
3.0674 0.3391 0.0000 C 0 0
|
|
3.6660 3.2787 0.0000 C 0 0
|
|
8.1591 -0.6978 0.0000 C 0 0
|
|
7.3351 1.7662 0.0000 C 0 0
|
|
-6.3876 3.5028 0.0000 C 0 0
|
|
-0.6756 -5.0219 0.0000 C 0 0
|
|
7.0358 0.2964 0.0000 C 0 0
|
|
3.8914 -2.1249 0.0000 C 0 0
|
|
-2.0982 -5.4976 0.0000 C 0 0
|
|
-4.5701 1.8943 0.0000 C 0 0 1 0 0 0
|
|
-6.9859 2.1273 0.0000 C 0 0 1 0 0 0
|
|
4.4900 0.8148 0.0000 C 0 0
|
|
1.3455 -1.6065 0.0000 C 0 0
|
|
4.7893 2.2846 0.0000 C 0 0
|
|
1.9442 1.3332 0.0000 C 0 0
|
|
1.0462 -3.0763 0.0000 C 0 0
|
|
2.2435 2.8030 0.0000 C 0 0
|
|
-0.6017 1.8516 0.0000 C 0 0
|
|
5.6132 -0.1794 0.0000 C 0 0
|
|
0.2223 -0.6124 0.0000 Cl 0 0
|
|
9.2823 -1.6919 0.0000 N 0 0
|
|
-3.2215 -4.5035 0.0000 N 0 0
|
|
6.2119 2.7603 0.0000 N 0 0
|
|
5.3139 -1.6492 0.0000 N 0 0
|
|
0.5216 0.8575 0.0000 N 0 0
|
|
-4.8945 3.3588 0.0000 N 0 0
|
|
-8.2913 2.8662 0.0000 O 0 0
|
|
-0.3024 3.3214 0.0000 O 0 0
|
|
1.1202 3.7971 0.0000 O 0 0
|
|
-0.3763 -3.5520 0.0000 O 0 0
|
|
-2.8482 3.8398 0.0000 H 0 0
|
|
-2.3235 -0.0940 0.0000 H 0 0
|
|
-3.9483 0.5292 0.0000 H 0 0
|
|
-7.8572 0.9063 0.0000 H 0 0
|
|
1 3 1 0
|
|
2 39 1 0
|
|
3 42 1 0
|
|
4 5 2 0
|
|
4 6 1 0
|
|
5 11 1 0
|
|
6 22 2 0
|
|
7 9 2 0
|
|
7 21 1 0
|
|
8 44 1 0
|
|
8 10 2 0
|
|
8 23 1 0
|
|
9 29 1 0
|
|
10 45 1 0
|
|
10 31 1 0
|
|
11 35 2 0
|
|
12 21 2 0
|
|
12 26 1 0
|
|
13 23 1 0
|
|
13 24 1 0
|
|
14 25 2 0
|
|
14 28 1 0
|
|
15 27 2 0
|
|
15 30 1 0
|
|
16 20 1 0
|
|
16 34 3 0
|
|
17 20 2 0
|
|
17 36 1 0
|
|
18 24 1 0
|
|
18 39 1 0
|
|
19 22 1 0
|
|
19 43 1 0
|
|
20 32 1 0
|
|
21 37 1 0
|
|
22 35 1 0
|
|
23 46 1 6
|
|
23 39 1 0
|
|
24 47 1 1
|
|
24 40 1 0
|
|
25 27 1 0
|
|
25 32 1 0
|
|
26 29 2 0
|
|
26 33 1 0
|
|
27 36 1 0
|
|
28 30 2 0
|
|
28 38 1 0
|
|
29 43 1 0
|
|
30 42 1 0
|
|
31 38 2 0
|
|
31 41 1 0
|
|
32 37 2 3
|
|
M END
|
|
"""
|
|
|
|
)CTAB";
|
|
std::unique_ptr<RWMol> m(MolBlockToMol(molblock, false, false));
|
|
REQUIRE(m);
|
|
m->updatePropertyCache();
|
|
MolOps::fastFindRings(*m);
|
|
MolOps::setBondStereoFromDirections(*m);
|
|
MolOps::RemoveHsParameters rhp;
|
|
bool sanitize = false;
|
|
MolOps::removeHs(*m, rhp, sanitize);
|
|
std::unique_ptr<RWMol> res((RWMol *)nrml.normalize(*m));
|
|
REQUIRE(res);
|
|
MolOps::sanitizeMol(*res);
|
|
MolOps::assignStereochemistry(*res);
|
|
CHECK(MolToSmiles(*res) ==
|
|
"CCOc1cc2[nH]cc(C#N)c(=Nc3ccc(OCc4ccccn4)c(Cl)c3)c2cc1NC(=O)/C=C/"
|
|
"[C@H]1C[C@H](O)CN1C");
|
|
}
|
|
}
|
|
TEST_CASE("problems with uncharging HS- from mol file", "[normalizer]") {
|
|
SECTION("example1") {
|
|
std::string mb = R"CTAB(
|
|
SciTegic12231509382D
|
|
|
|
1 0 0 0 0 0 999 V2000
|
|
13.0092 -4.9004 0.0000 S 0 5
|
|
M CHG 1 1 -1
|
|
M END)CTAB";
|
|
std::unique_ptr<ROMol> m(MolBlockToMol(mb));
|
|
REQUIRE(m);
|
|
MolStandardize::Uncharger uncharger;
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
CHECK(MolToSmiles(*outm) == "S");
|
|
}
|
|
}
|
|
|
|
TEST_CASE("explicit Hs and Ns when neutralizing", "[normalizer]") {
|
|
SECTION("example1") {
|
|
std::string molblock = R"CTAB(
|
|
Mrv1810 10301909502D
|
|
|
|
2 1 0 0 0 0 999 V2000
|
|
-3.0000 0.6316 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
|
|
-2.1750 0.6316 0.0000 N 0 5 0 0 0 0 0 0 0 0 0 0
|
|
1 2 1 0 0 0 0
|
|
M CHG 1 2 -1
|
|
M END
|
|
)CTAB";
|
|
std::unique_ptr<RWMol> m(MolBlockToMol(molblock, false, false));
|
|
REQUIRE(m);
|
|
m->updatePropertyCache();
|
|
MolStandardize::Uncharger uc;
|
|
std::unique_ptr<ROMol> res((ROMol *)uc.uncharge(*m));
|
|
REQUIRE(res);
|
|
CHECK(res->getAtomWithIdx(1)->getFormalCharge() == 0);
|
|
CHECK(res->getAtomWithIdx(1)->getTotalNumHs() == 2);
|
|
auto mb = MolToMolBlock(*res);
|
|
// should be no valence markers in the output mol block:
|
|
CHECK(mb.find("0.0000 N 0 0 0 0 0 0") != std::string::npos);
|
|
}
|
|
}
|
|
|
|
TEST_CASE("fragment remover not considering bond counts", "[fragments][bug]") {
|
|
std::string salts = R"DATA(Benethamine C(Cc1ccccc1)NCc2ccccc2
|
|
Chloride Cl
|
|
)DATA";
|
|
std::istringstream iss(salts);
|
|
bool leave_last = false;
|
|
MolStandardize::FragmentRemover rmv(iss, leave_last);
|
|
|
|
SECTION("example that should not be removed") {
|
|
std::string molblock = R"CTAB(
|
|
SciTegic11261411092D
|
|
|
|
17 18 0 0 0 0 999 V2000
|
|
0.0000 0.0000 0.0000 Cl 0 0
|
|
2.2393 0.5156 0.0000 N 0 0
|
|
3.6682 0.5156 0.0000 C 0 0
|
|
2.9538 0.1031 0.0000 C 0 0
|
|
3.6682 1.3406 0.0000 C 0 0
|
|
2.9538 -0.7219 0.0000 C 0 0
|
|
4.3827 0.1031 0.0000 C 0 0
|
|
2.9538 1.7531 0.0000 C 0 0
|
|
4.3827 1.7531 0.0000 C 0 0
|
|
2.2393 1.3406 0.0000 C 0 0
|
|
3.6682 -1.1344 0.0000 C 0 0
|
|
2.2393 -1.1344 0.0000 C 0 0
|
|
5.0972 0.5156 0.0000 C 0 0
|
|
5.0972 1.3406 0.0000 C 0 0
|
|
3.6682 -1.9594 0.0000 C 0 0
|
|
2.2393 -1.9594 0.0000 C 0 0
|
|
2.9538 -2.3719 0.0000 C 0 0
|
|
2 4 1 0
|
|
2 10 1 0
|
|
3 4 1 0
|
|
3 5 1 0
|
|
3 7 2 0
|
|
4 6 1 0
|
|
5 8 1 0
|
|
5 9 2 0
|
|
6 11 2 0
|
|
6 12 1 0
|
|
7 13 1 0
|
|
8 10 1 0
|
|
9 14 1 0
|
|
11 15 1 0
|
|
12 16 2 0
|
|
13 14 2 0
|
|
15 17 2 0
|
|
16 17 1 0
|
|
M END)CTAB";
|
|
std::unique_ptr<RWMol> m(MolBlockToMol(molblock));
|
|
REQUIRE(m);
|
|
m->updatePropertyCache();
|
|
|
|
std::unique_ptr<ROMol> sm(rmv.remove(*m));
|
|
REQUIRE(sm);
|
|
CHECK(sm->getNumAtoms() == 16);
|
|
}
|
|
|
|
SECTION("example that should be removed") {
|
|
std::string molblock = R"CTAB(
|
|
Mrv1810 11071914502D
|
|
|
|
17 17 0 0 0 0 999 V2000
|
|
0.0000 0.0000 0.0000 Cl 0 0 0 0 0 0 0 0 0 0 0 0
|
|
2.2393 0.5156 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
|
|
3.6682 0.5156 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
|
|
2.9538 0.1031 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
|
|
3.6682 1.3406 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
|
|
2.9538 -0.7219 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
|
|
4.3827 0.1031 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
|
|
2.9538 1.7531 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
|
|
4.3827 1.7531 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
|
|
2.2393 1.3406 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
|
|
3.6682 -1.1344 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
|
|
2.2393 -1.1344 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
|
|
5.0972 0.5156 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
|
|
5.0972 1.3406 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
|
|
3.6682 -1.9594 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
|
|
2.2393 -1.9594 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
|
|
2.9538 -2.3719 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
|
|
2 4 1 0 0 0 0
|
|
2 10 1 0 0 0 0
|
|
3 5 1 0 0 0 0
|
|
3 7 2 0 0 0 0
|
|
4 6 1 0 0 0 0
|
|
5 8 1 0 0 0 0
|
|
5 9 2 0 0 0 0
|
|
6 11 2 0 0 0 0
|
|
6 12 1 0 0 0 0
|
|
7 13 1 0 0 0 0
|
|
8 10 1 0 0 0 0
|
|
9 14 1 0 0 0 0
|
|
11 15 1 0 0 0 0
|
|
12 16 2 0 0 0 0
|
|
13 14 2 0 0 0 0
|
|
15 17 2 0 0 0 0
|
|
16 17 1 0 0 0 0
|
|
M END
|
|
)CTAB";
|
|
std::unique_ptr<RWMol> m(MolBlockToMol(molblock));
|
|
REQUIRE(m);
|
|
m->updatePropertyCache();
|
|
|
|
std::unique_ptr<ROMol> sm(rmv.remove(*m));
|
|
REQUIRE(sm);
|
|
CHECK(sm->getNumAtoms() == 0);
|
|
}
|
|
}
|
|
|
|
TEST_CASE("github #2792: carbon in the uncharger", "[uncharger][bug]") {
|
|
SECTION("carbocation 1") {
|
|
auto m = "C[CH2+]"_smiles;
|
|
REQUIRE(m);
|
|
MolStandardize::Uncharger uncharger;
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
REQUIRE(outm);
|
|
CHECK(outm->getAtomWithIdx(1)->getFormalCharge() == 0);
|
|
CHECK(outm->getAtomWithIdx(1)->getTotalNumHs() == 3);
|
|
}
|
|
SECTION("boron cation") {
|
|
auto m = "C[BH+]"_smiles;
|
|
REQUIRE(m);
|
|
MolStandardize::Uncharger uncharger;
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
REQUIRE(outm);
|
|
CHECK(outm->getAtomWithIdx(1)->getFormalCharge() == 0);
|
|
CHECK(outm->getAtomWithIdx(1)->getTotalNumHs() == 2);
|
|
}
|
|
SECTION("carbanion 1") {
|
|
auto m = "C[CH2-]"_smiles;
|
|
REQUIRE(m);
|
|
MolStandardize::Uncharger uncharger;
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
REQUIRE(outm);
|
|
CHECK(outm->getAtomWithIdx(1)->getFormalCharge() == 0);
|
|
CHECK(outm->getAtomWithIdx(1)->getTotalNumHs() == 3);
|
|
}
|
|
SECTION("carbocation 2") {
|
|
auto m = "CN1C=CN[CH+]1"_smiles;
|
|
REQUIRE(m);
|
|
MolStandardize::Uncharger uncharger;
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
REQUIRE(outm);
|
|
CHECK(outm->getAtomWithIdx(5)->getFormalCharge() == 0);
|
|
CHECK(outm->getAtomWithIdx(5)->getTotalNumHs() == 2);
|
|
}
|
|
SECTION("carbocation 2 without sanitization") {
|
|
SmilesParserParams params;
|
|
params.sanitize = false;
|
|
std::unique_ptr<ROMol> m(SmilesToMol("CN1C=CN[CH+]1", params));
|
|
REQUIRE(m);
|
|
m->updatePropertyCache();
|
|
MolStandardize::Uncharger uncharger;
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
REQUIRE(outm);
|
|
CHECK(outm->getAtomWithIdx(5)->getFormalCharge() == 0);
|
|
CHECK(outm->getAtomWithIdx(5)->getTotalNumHs() == 2);
|
|
}
|
|
}
|
|
|
|
TEST_CASE("github #2965: molecules properties not retained after cleanup",
|
|
"[cleanup][bug]") {
|
|
SECTION("example 1") {
|
|
MolStandardize::CleanupParameters params;
|
|
std::unique_ptr<RWMol> m(SmilesToMol("Cl.c1cnc(OCCCC2CCNCC2)cn1"));
|
|
REQUIRE(m);
|
|
m->setProp("testing_prop", "1234");
|
|
std::unique_ptr<RWMol> res(MolStandardize::cleanup(*m, params));
|
|
REQUIRE(res);
|
|
auto x = res->getDict();
|
|
CHECK(x.getVal<std::string>("testing_prop") == "1234");
|
|
}
|
|
}
|
|
|
|
TEST_CASE(
|
|
"github #2970: chargeParent() segmentation fault when standardization is "
|
|
"skipped i.e. skip_standardize is set to true") {
|
|
auto m = "COC=1C=CC(NC=2N=CN=C3NC=NC23)=CC1"_smiles;
|
|
REQUIRE(m);
|
|
MolStandardize::CleanupParameters params;
|
|
std::unique_ptr<RWMol> res(MolStandardize::cleanup(*m, params));
|
|
|
|
std::unique_ptr<ROMol> outm(MolStandardize::chargeParent(*res, params, true));
|
|
|
|
REQUIRE(outm);
|
|
CHECK(MolToSmiles(*outm) == "COc1ccc(Nc2ncnc3[nH]cnc23)cc1");
|
|
}
|
|
|
|
TEST_CASE("update parameters from JSON") {
|
|
std::string rdbase = std::getenv("RDBASE");
|
|
|
|
// a few tests to make sure the basics work
|
|
MolStandardize::CleanupParameters params;
|
|
CHECK(params.maxRestarts == 200);
|
|
CHECK(params.tautomerReassignStereo == true);
|
|
|
|
MolStandardize::updateCleanupParamsFromJSON(params,
|
|
R"JSON({"maxRestarts":12,
|
|
"tautomerReassignStereo":false,
|
|
"fragmentFile":"foo.txt"})JSON");
|
|
CHECK(params.maxRestarts == 12);
|
|
CHECK(params.tautomerReassignStereo == false);
|
|
CHECK(params.fragmentFile == "foo.txt");
|
|
}
|
|
|
|
TEST_CASE("provide normalizer parameters as data") {
|
|
std::vector<std::pair<std::string, std::string>> tfs{
|
|
{"Bad amide tautomer1",
|
|
"[C:1]([OH1;D1:2])=;!@[NH1:3]>>[C:1](=[OH0:2])-[NH2:3]"},
|
|
{"Bad amide tautomer2",
|
|
"[C:1]([OH1;D1:2])=;!@[NH0:3]>>[C:1](=[OH0:2])-[NH1:3]"}};
|
|
SECTION("example1") {
|
|
MolStandardize::Normalizer nrml(tfs, 20);
|
|
auto m = "Cl.Cl.OC(=N)NCCCCCCCCCCCCNC(O)=N"_smiles;
|
|
REQUIRE(m);
|
|
std::unique_ptr<ROMol> res(nrml.normalize(*m));
|
|
REQUIRE(res);
|
|
CHECK(MolToSmiles(*res) == "Cl.Cl.NC(=O)NCCCCCCCCCCCCNC(N)=O");
|
|
}
|
|
SECTION("example2") {
|
|
MolStandardize::Normalizer nrml(tfs, 20);
|
|
auto m = "OC(=N)NCCCCCCCCCCCCNC(O)=N"_smiles;
|
|
REQUIRE(m);
|
|
std::unique_ptr<ROMol> res(nrml.normalize(*m));
|
|
REQUIRE(res);
|
|
CHECK(MolToSmiles(*res) == "NC(=O)NCCCCCCCCCCCCNC(N)=O");
|
|
}
|
|
}
|
|
|
|
TEST_CASE("provide normalizer parameters as JSON") {
|
|
SECTION("example1") {
|
|
std::string json = R"JSON({"normalizationData":[
|
|
{"name":"silly 1","smarts":"[Cl:1]>>[F:1]"},
|
|
{"name":"silly 2","smarts":"[Br:1]>>[F:1]"}
|
|
]})JSON";
|
|
MolStandardize::CleanupParameters params;
|
|
MolStandardize::updateCleanupParamsFromJSON(params, json);
|
|
CHECK(params.normalizationData.size() == 2);
|
|
|
|
MolStandardize::Normalizer nrml(params.normalizationData, 20);
|
|
auto m = "ClCCCBr"_smiles;
|
|
REQUIRE(m);
|
|
std::unique_ptr<ROMol> res(nrml.normalize(*m));
|
|
REQUIRE(res);
|
|
CHECK(MolToSmiles(*res) == "FCCCF");
|
|
}
|
|
}
|
|
|
|
TEST_CASE("provide charge parameters as data") {
|
|
std::vector<std::tuple<std::string, std::string, std::string>> params{
|
|
{"-CO2H", "C(=O)[OH]", "C(=O)[O-]"}, {"phenol", "c[OH]", "c[O-]"}};
|
|
SECTION("example1") {
|
|
MolStandardize::Reionizer reion(params);
|
|
auto m = "c1cc([O-])cc(C(=O)O)c1"_smiles;
|
|
REQUIRE(m);
|
|
std::unique_ptr<ROMol> res(reion.reionize(*m));
|
|
REQUIRE(res);
|
|
CHECK(MolToSmiles(*res) == "O=C([O-])c1cccc(O)c1");
|
|
}
|
|
SECTION("example2") {
|
|
MolStandardize::Reionizer reion(params);
|
|
auto m = "C1=C(C=CC(=C1)[S]([O-])=O)[S](O)(=O)=O"_smiles;
|
|
REQUIRE(m);
|
|
std::unique_ptr<ROMol> res(reion.reionize(*m));
|
|
REQUIRE(res);
|
|
CHECK(MolToSmiles(*res) == "O=S([O-])c1ccc(S(=O)(=O)O)cc1");
|
|
}
|
|
}
|
|
|
|
TEST_CASE("provide charge parameters as JSON") {
|
|
SECTION("example1") {
|
|
std::string json = R"JSON({"acidbaseData":[
|
|
{"name":"-CO2H","acid":"C(=O)[OH]","base":"C(=O)[O-]"},
|
|
{"name":"phenol","acid":"c[OH]","base":"c[O-]"}
|
|
]})JSON";
|
|
MolStandardize::CleanupParameters params;
|
|
MolStandardize::updateCleanupParamsFromJSON(params, json);
|
|
CHECK(params.acidbaseData.size() == 2);
|
|
|
|
MolStandardize::Reionizer reion(params.acidbaseData);
|
|
auto m = "c1cc([O-])cc(C(=O)O)c1"_smiles;
|
|
REQUIRE(m);
|
|
std::unique_ptr<ROMol> res(reion.reionize(*m));
|
|
REQUIRE(res);
|
|
CHECK(MolToSmiles(*res) == "O=C([O-])c1cccc(O)c1");
|
|
m = "C1=C(C=CC(=C1)[S]([O-])=O)[S](O)(=O)=O"_smiles;
|
|
REQUIRE(m);
|
|
res.reset(reion.reionize(*m));
|
|
REQUIRE(res);
|
|
CHECK(MolToSmiles(*res) == "O=S([O-])c1ccc(S(=O)(=O)O)cc1");
|
|
}
|
|
}
|
|
|
|
TEST_CASE("provide tautomer parameters as JSON") {
|
|
SECTION("example1") {
|
|
std::string json = R"JSON({"tautomerTransformData":[
|
|
{"name":"1,3 (thio)keto/enol f","smarts":"[CX4!H0]-[C]=[O,S,Se,Te;X1]","bonds":"","charges":""},
|
|
{"name":"1,3 (thio)keto/enol r","smarts":"[O,S,Se,Te;X2!H0]-[C]=[C]"}
|
|
]})JSON";
|
|
MolStandardize::CleanupParameters params;
|
|
MolStandardize::updateCleanupParamsFromJSON(params, json);
|
|
CHECK(params.tautomerTransformData.size() == 2);
|
|
MolStandardize::TautomerEnumerator te(params);
|
|
auto m = "CCC=O"_smiles;
|
|
REQUIRE(m);
|
|
auto tauts = te.enumerate(*m);
|
|
CHECK(tauts.size() == 2);
|
|
CHECK(MolToSmiles(*tauts[0]) == "CC=CO");
|
|
CHECK(MolToSmiles(*tauts[1]) == "CCC=O");
|
|
}
|
|
SECTION("example 2") {
|
|
std::string json = R"JSON({"tautomerTransformData":[
|
|
{"name":"isocyanide f", "smarts":"[C-0!H0]#[N+0]", "bonds":"#", "charges":"-+"},
|
|
{"name":"isocyanide r", "smarts":"[N+!H0]#[C-]", "bonds":"#", "charges":"-+"}
|
|
]})JSON";
|
|
MolStandardize::CleanupParameters params;
|
|
MolStandardize::updateCleanupParamsFromJSON(params, json);
|
|
CHECK(params.tautomerTransformData.size() == 2);
|
|
MolStandardize::TautomerEnumerator te(params);
|
|
auto m = "C#N"_smiles;
|
|
REQUIRE(m);
|
|
auto tauts = te.enumerate(*m);
|
|
CHECK(tauts.size() == 2);
|
|
CHECK(MolToSmiles(*tauts[0]) == "C#N");
|
|
CHECK(MolToSmiles(*tauts[1]) == "[C-]#[NH+]");
|
|
}
|
|
SECTION("example3") {
|
|
std::string json = R"JSON({"tautomerTransformData":[
|
|
{"name":"1,3 (thio)keto/enol f","smarts":"[CX4!H0]-[C]=[O,S,Se,Te;X1]","bonds":"","charges":""},
|
|
{"name":"1,3 (thio)keto/enol r","smarts":"[O,S,Se,Te;X2!H0]-[C]=[C]"}
|
|
]})JSON";
|
|
MolStandardize::CleanupParameters params;
|
|
MolStandardize::updateCleanupParamsFromJSON(params, json);
|
|
CHECK(params.tautomerTransformData.size() == 2);
|
|
auto m = "CCC=O"_smiles;
|
|
REQUIRE(m);
|
|
std::unique_ptr<RWMol> nm{MolStandardize::canonicalTautomer(m.get())};
|
|
CHECK(MolToSmiles(*nm) == "CCC=O");
|
|
}
|
|
}
|
|
|
|
TEST_CASE("provide fragment parameters as JSON") {
|
|
SECTION("example1") {
|
|
std::string json = R"JSON({"fragmentData":[
|
|
{"name":"hydrogen", "smarts":"[H]"},
|
|
{"name":"fluorine", "smarts":"[F]"},
|
|
{"name":"chlorine", "smarts":"[Cl]"}
|
|
]})JSON";
|
|
MolStandardize::CleanupParameters params;
|
|
MolStandardize::updateCleanupParamsFromJSON(params, json);
|
|
CHECK(params.fragmentData.size() == 3);
|
|
std::unique_ptr<MolStandardize::FragmentRemover> fm{
|
|
MolStandardize::fragmentRemoverFromParams(params, true)};
|
|
auto m = "[F-].[Cl-].[Br-].CC"_smiles;
|
|
REQUIRE(m);
|
|
std::unique_ptr<ROMol> nm{fm->remove(*m)};
|
|
CHECK(MolToSmiles(*nm) == "CC.[Br-]");
|
|
}
|
|
}
|
|
|
|
TEST_CASE("tautomer parent") {
|
|
SECTION("example1") {
|
|
auto m = "[O-]c1ccc(C(=O)O)cc1CC=CO"_smiles;
|
|
REQUIRE(m);
|
|
std::unique_ptr<ROMol> nm{MolStandardize::tautomerParent(*m)};
|
|
CHECK(MolToSmiles(*nm) == "O=CCCc1cc(C(=O)[O-])ccc1O");
|
|
MolStandardize::tautomerParentInPlace(*m);
|
|
CHECK(MolToSmiles(*m) == "O=CCCc1cc(C(=O)[O-])ccc1O");
|
|
}
|
|
}
|
|
|
|
TEST_CASE("stereo parent") {
|
|
SECTION("example1") {
|
|
auto m = "C[C@](F)(Cl)C/C=C/[C@H](F)Cl"_smiles;
|
|
REQUIRE(m);
|
|
std::unique_ptr<ROMol> nm{MolStandardize::stereoParent(*m)};
|
|
CHECK(MolToSmiles(*nm) == "CC(F)(Cl)CC=CC(F)Cl");
|
|
}
|
|
}
|
|
|
|
TEST_CASE("isotope parent") {
|
|
SECTION("example1") {
|
|
auto m = "[12CH3][13CH3]"_smiles;
|
|
REQUIRE(m);
|
|
std::unique_ptr<ROMol> nm{MolStandardize::isotopeParent(*m)};
|
|
CHECK(MolToSmiles(*nm) == "CC");
|
|
}
|
|
SECTION("attached D") {
|
|
// this behavior - leaving H atoms with no isotope info - is intentional
|
|
// It may be that we're working with molecules which include Hs and we don't
|
|
// want to just automatically remove them.
|
|
auto m = "O[2H]"_smiles;
|
|
REQUIRE(m);
|
|
std::unique_ptr<ROMol> nm{MolStandardize::isotopeParent(*m)};
|
|
CHECK(MolToSmiles(*nm) == "[H]O");
|
|
}
|
|
}
|
|
|
|
TEST_CASE("super parent") {
|
|
SECTION("example1") {
|
|
auto m = "[O-]c1c([12C@H](F)Cl)c(O[2H])c(C(=O)O)cc1CC=CO.[Na+]"_smiles;
|
|
REQUIRE(m);
|
|
std::unique_ptr<ROMol> nm{MolStandardize::superParent(*m)};
|
|
CHECK(MolToSmiles(*nm) == "O=CCCc1cc(C(=O)O)c(O)c(C(F)Cl)c1O");
|
|
MolStandardize::superParentInPlace(*m);
|
|
CHECK(MolToSmiles(*m) == "O=CCCc1cc(C(=O)O)c(O)c(C(F)Cl)c1O");
|
|
}
|
|
}
|
|
|
|
TEST_CASE(
|
|
"Github #4260: Exception thrown by reionizer when dealing with Mg+2") {
|
|
SECTION("reported") {
|
|
auto m = "[Mg].OC(=O)c1ccccc1C"_smiles;
|
|
REQUIRE(m);
|
|
std::unique_ptr<RWMol> m2(MolStandardize::reionize(m.get()));
|
|
REQUIRE(m2);
|
|
CHECK(m2->getAtomWithIdx(0)->getFormalCharge() == 2);
|
|
CHECK(m2->getAtomWithIdx(1)->getFormalCharge() == -1);
|
|
}
|
|
}
|
|
|
|
TEST_CASE("Github #5008: bad tautomers for phosphorous compounds") {
|
|
SECTION("as reported") {
|
|
auto m = "NP(=O)(O)N(CCCl)CCCl"_smiles;
|
|
REQUIRE(m);
|
|
MolStandardize::TautomerEnumerator tenum;
|
|
auto tauts = tenum.enumerate(*m);
|
|
CHECK(tauts.size() == 1);
|
|
}
|
|
SECTION("P which should tautomerize") {
|
|
auto m = "CP(O)C"_smiles;
|
|
REQUIRE(m);
|
|
MolStandardize::TautomerEnumerator tenum;
|
|
auto tauts = tenum.enumerate(*m);
|
|
CHECK(tauts.size() == 2);
|
|
}
|
|
SECTION("Canonical version") {
|
|
auto m = "CP(O)C"_smiles;
|
|
REQUIRE(m);
|
|
std::unique_ptr<RWMol> ct(MolStandardize::canonicalTautomer(m.get()));
|
|
REQUIRE(ct);
|
|
CHECK(MolToSmiles(*ct) == "C[PH](C)=O");
|
|
}
|
|
}
|
|
|
|
TEST_CASE("Github #5169: Standardization via RDKit breaks molecules",
|
|
"[uncharger]") {
|
|
SECTION("basics") {
|
|
SmilesParserParams ps;
|
|
ps.sanitize = false;
|
|
std::vector<std::string> smis = {"C[O+](C)C", "[H]/[O+]=C/Cl"};
|
|
for (const auto &smi : smis) {
|
|
std::unique_ptr<RWMol> m{SmilesToMol(smi, ps)};
|
|
REQUIRE(m);
|
|
m->updatePropertyCache(false);
|
|
MolStandardize::Uncharger uncharger;
|
|
std::unique_ptr<ROMol> outm(uncharger.uncharge(*m));
|
|
REQUIRE(outm);
|
|
INFO("failing for smiles " << smi);
|
|
CHECK(outm->getAtomWithIdx(1)->getFormalCharge() == 1);
|
|
}
|
|
}
|
|
}
|
|
|
|
TEST_CASE("asymmetric imine tautomer generation", "[tautomers]") {
|
|
SECTION("basics") {
|
|
MolStandardize::TautomerEnumerator tenum;
|
|
// clang-format off
|
|
std::vector<std::pair<std::string, unsigned>> data = {
|
|
{"C=C1NNC(=O)N1*", 2},
|
|
{"CC1=NN=C(O)N1*", 2},
|
|
{"C-C=NC", 1},
|
|
{"C-C=N", 2},
|
|
{"C-C=Nc1ccccc1", 2},
|
|
};
|
|
// clang-format on
|
|
for (const auto &pr : data) {
|
|
INFO(pr.first);
|
|
std::unique_ptr<RWMol> m(SmilesToMol(pr.first));
|
|
auto res = tenum.enumerate(*m);
|
|
CHECK(res.size() == pr.second);
|
|
}
|
|
}
|
|
}
|
|
|
|
TEST_CASE("Github 5317: standardization failing with zwitterionic sulfone") {
|
|
SECTION("basics") {
|
|
auto m = "C[S+2]([O-])([O-])C([O-])C(=O)O"_smiles;
|
|
REQUIRE(m);
|
|
MolStandardize::Uncharger uc;
|
|
std::unique_ptr<ROMol> res{uc.uncharge(*m)};
|
|
REQUIRE(res);
|
|
CHECK(MolToSmiles(*res) == "C[S+2]([O-])([O-])C(O)C(=O)O");
|
|
}
|
|
SECTION("don't overdo it") {
|
|
auto m = "C[S+2]([O-])([O-])C([O-])C(=O)O.[Na+]"_smiles;
|
|
REQUIRE(m);
|
|
MolStandardize::Uncharger uc;
|
|
std::unique_ptr<ROMol> res{uc.uncharge(*m)};
|
|
REQUIRE(res);
|
|
CHECK(MolToSmiles(*res) == "C[S+2]([O-])([O-])C([O-])C(=O)O.[Na+]");
|
|
}
|
|
}
|
|
|
|
TEST_CASE("Github 5318: standardizing unsanitized molecules should work") {
|
|
SmilesParserParams ps;
|
|
ps.sanitize = false;
|
|
ps.removeHs = false;
|
|
std::unique_ptr<RWMol> m{SmilesToMol("C[S+2]([O-])([O-])C([O-])C(=O)O", ps)};
|
|
REQUIRE(m);
|
|
std::unique_ptr<RWMol> m2{SmilesToMol("Cc1[nH]ncc1.[Cl]", ps)};
|
|
REQUIRE(m2);
|
|
SECTION("reionizer") {
|
|
MolStandardize::Reionizer reion;
|
|
std::unique_ptr<ROMol> res{reion.reionize(*m)};
|
|
REQUIRE(res);
|
|
CHECK(MolToSmiles(*res) == "C[S+2]([O-])([O-])C(O)C(=O)[O-]");
|
|
}
|
|
SECTION("uncharger") {
|
|
MolStandardize::Uncharger uc;
|
|
std::unique_ptr<ROMol> res{uc.uncharge(*m)};
|
|
REQUIRE(res);
|
|
CHECK(MolToSmiles(*res) == "C[S+2]([O-])([O-])C(O)C(=O)O");
|
|
}
|
|
SECTION("normalizer") {
|
|
std::unique_ptr<ROMol> res{MolStandardize::normalize(m.get())};
|
|
REQUIRE(res);
|
|
CHECK(MolToSmiles(*res) == "CS(=O)(=O)C([O-])C(=O)O");
|
|
}
|
|
SECTION("tautomer") {
|
|
std::unique_ptr<ROMol> res{MolStandardize::canonicalTautomer(m2.get())};
|
|
REQUIRE(res);
|
|
CHECK(MolToSmiles(*res) == "Cc1cc[nH]n1.Cl");
|
|
RWMol cp(*m2);
|
|
MolStandardize::canonicalTautomerInPlace(cp);
|
|
CHECK(MolToSmiles(cp) == "Cc1cc[nH]n1.Cl");
|
|
}
|
|
SECTION("fragments") {
|
|
std::unique_ptr<ROMol> res{MolStandardize::removeFragments(m2.get())};
|
|
REQUIRE(res);
|
|
CHECK(MolToSmiles(*res) == "Cc1ccn[nH]1");
|
|
}
|
|
}
|
|
|
|
TEST_CASE("Github #5320: cleanup() and stereochemistry") {
|
|
SECTION("basics") {
|
|
auto m = "Cl[C@](O)([O-])C(=O)O"_smiles;
|
|
REQUIRE(m);
|
|
CHECK(m->getAtomWithIdx(1)->getChiralTag() !=
|
|
Atom::ChiralType::CHI_UNSPECIFIED);
|
|
std::unique_ptr<RWMol> m2{MolStandardize::cleanup(m.get())};
|
|
REQUIRE(m2);
|
|
CHECK(m2->getAtomWithIdx(1)->getChiralTag() ==
|
|
Atom::ChiralType::CHI_UNSPECIFIED);
|
|
CHECK(MolToSmiles(*m2) == "O=C([O-])C(O)(O)Cl");
|
|
}
|
|
}
|
|
|
|
TEST_CASE("Github #5402: order dependence of tautomer transforms") {
|
|
SECTION("as-reported") {
|
|
MolStandardize::TautomerEnumerator te;
|
|
|
|
auto m1 = "c1ccc([C@@H](CC2=NCCN2)c2ccccn2)cc1"_smiles;
|
|
REQUIRE(m1);
|
|
auto m2 = "C([C@H](C1=CC=CC=C1)C2=NC=CC=C2)C3=NCCN3"_smiles;
|
|
REQUIRE(m2);
|
|
std::cerr << " * - * - * - * m1" << std::endl;
|
|
std::unique_ptr<ROMol> res1{te.canonicalize(*m1)};
|
|
REQUIRE(res1);
|
|
std::cerr << " * - * - * - * m2" << std::endl;
|
|
std::unique_ptr<ROMol> res2{te.canonicalize(*m2)};
|
|
REQUIRE(res2);
|
|
CHECK(MolToSmiles(*res1) == MolToSmiles(*res2));
|
|
}
|
|
SECTION("zoom") {
|
|
MolStandardize::CleanupParameters params;
|
|
const std::vector<
|
|
std::tuple<std::string, std::string, std::string, std::string>>
|
|
tTransforms{
|
|
std::make_tuple(std::string("special imine r1"),
|
|
std::string("[Cz0R0X4!H0]-[c]=[nz0]"),
|
|
std::string(""), std::string("")),
|
|
std::make_tuple(std::string("special imine r2"),
|
|
std::string("[Cz0R0X4!H0]-[c](=c)-[nz0]"),
|
|
std::string("==-"), std::string("")),
|
|
};
|
|
|
|
params.tautomerTransformData = tTransforms;
|
|
MolStandardize::TautomerEnumerator te(params);
|
|
|
|
auto m1 = "c1ccc([C@@H](CC2=NCCN2)c2ccccn2)cc1"_smiles;
|
|
REQUIRE(m1);
|
|
auto m2 = "C([C@H](C1=CC=CC=C1)C2=NC=CC=C2)C3=NCCN3"_smiles;
|
|
REQUIRE(m2);
|
|
|
|
std::cerr << " * - * - * - * m1" << std::endl;
|
|
std::unique_ptr<ROMol> res1{te.canonicalize(*m1)};
|
|
REQUIRE(res1);
|
|
std::cerr << " * - * - * - * m2" << std::endl;
|
|
std::unique_ptr<ROMol> res2{te.canonicalize(*m2)};
|
|
REQUIRE(res2);
|
|
CHECK(MolToSmiles(*res1) == MolToSmiles(*res2));
|
|
}
|
|
}
|
|
|
|
TEST_CASE("Github 5784: kekulization error when enumerating tautomers") {
|
|
std::vector<std::string> smis{"NC1=NC=NC(C)=C1", "CC1N=CN(C)C(=O)C=1",
|
|
"CC1=CC=CC(=O)N1C"};
|
|
for (const auto &smi : smis) {
|
|
INFO(smi);
|
|
std::unique_ptr<ROMol> m{SmilesToMol(smi)};
|
|
REQUIRE(m);
|
|
MolStandardize::TautomerEnumerator te;
|
|
std::unique_ptr<ROMol> res(te.canonicalize(*m));
|
|
REQUIRE(res);
|
|
}
|
|
}
|
|
|
|
TEST_CASE("in place operations") {
|
|
SECTION("reionizer") {
|
|
MolStandardize::Reionizer reion;
|
|
auto m = "c1cc([O-])cc(C(=O)O)c1"_smiles;
|
|
REQUIRE(m);
|
|
reion.reionizeInPlace(*m);
|
|
CHECK(MolToSmiles(*m) == "O=C([O-])c1cccc(O)c1");
|
|
}
|
|
SECTION("reionize") {
|
|
auto m = "c1cc([O-])cc(C(=O)O)c1"_smiles;
|
|
REQUIRE(m);
|
|
MolStandardize::reionizeInPlace(*m);
|
|
CHECK(MolToSmiles(*m) == "O=C([O-])c1cccc(O)c1");
|
|
}
|
|
SECTION("uncharge") {
|
|
MolStandardize::Uncharger unchg;
|
|
auto m = "c1cc([O-])cc(C(=O)O)c1"_smiles;
|
|
REQUIRE(m);
|
|
unchg.unchargeInPlace(*m);
|
|
CHECK(MolToSmiles(*m) == "O=C(O)c1cccc(O)c1");
|
|
}
|
|
SECTION("normalizer") {
|
|
MolStandardize::Normalizer nrml;
|
|
SmilesParserParams ps;
|
|
ps.sanitize = false;
|
|
std::unique_ptr<RWMol> m{SmilesToMol("O=N(=O)-CC-N(=O)=O", ps)};
|
|
REQUIRE(m);
|
|
nrml.normalizeInPlace(*m);
|
|
CHECK(MolToSmiles(*m) == "O=[N+]([O-])CC[N+](=O)[O-]");
|
|
m.reset(SmilesToMol("OCCN", ps));
|
|
REQUIRE(m);
|
|
nrml.normalizeInPlace(*m);
|
|
CHECK(MolToSmiles(*m) == "NCCO");
|
|
}
|
|
SECTION("normalize") {
|
|
SmilesParserParams ps;
|
|
ps.sanitize = false;
|
|
std::unique_ptr<RWMol> m{SmilesToMol("O=N(=O)-CC-N(=O)=O", ps)};
|
|
REQUIRE(m);
|
|
MolStandardize::normalizeInPlace(*m);
|
|
CHECK(MolToSmiles(*m) == "O=[N+]([O-])CC[N+](=O)[O-]");
|
|
m.reset(SmilesToMol("OCCN", ps));
|
|
REQUIRE(m);
|
|
MolStandardize::normalizeInPlace(*m);
|
|
CHECK(MolToSmiles(*m) == "NCCO");
|
|
}
|
|
SECTION("FragmentRemover") {
|
|
auto m = "CCCC.Cl.[Na]"_smiles;
|
|
REQUIRE(m);
|
|
MolStandardize::FragmentRemover fragremover;
|
|
RWMol cp1(*m);
|
|
fragremover.removeInPlace(cp1);
|
|
CHECK(MolToSmiles(cp1) == "CCCC");
|
|
RWMol cp2(*m);
|
|
MolStandardize::removeFragmentsInPlace(cp2);
|
|
CHECK(MolToSmiles(cp2) == "CCCC");
|
|
}
|
|
SECTION("FragmentParent") {
|
|
auto m = "CCCC.Cl.[Na]"_smiles;
|
|
REQUIRE(m);
|
|
RWMol cp1(*m);
|
|
MolStandardize::fragmentParentInPlace(cp1);
|
|
// note: this isn't a nice answer, and it should be
|
|
// fixed, but it is what the code currently generates
|
|
CHECK(MolToSmiles(cp1) == "[CH2-]CCC");
|
|
}
|
|
SECTION("ChargeParent") {
|
|
auto m = "[O-]C(=O)CCC.[Na+]"_smiles;
|
|
REQUIRE(m);
|
|
RWMol cp1(*m);
|
|
MolStandardize::chargeParentInPlace(cp1);
|
|
CHECK(MolToSmiles(cp1) == "CCCC(=O)O");
|
|
}
|
|
SECTION("IsotopeParent") {
|
|
auto m = "[13CH3]C"_smiles;
|
|
REQUIRE(m);
|
|
RWMol cp1(*m);
|
|
MolStandardize::isotopeParentInPlace(cp1);
|
|
CHECK(MolToSmiles(cp1) == "CC");
|
|
}
|
|
SECTION("StereoParent") {
|
|
auto m = "F[C@H](O)Cl"_smiles;
|
|
REQUIRE(m);
|
|
RWMol cp1(*m);
|
|
MolStandardize::stereoParentInPlace(cp1);
|
|
CHECK(MolToSmiles(cp1) == "OC(F)Cl");
|
|
}
|
|
SECTION("cleanup") {
|
|
SmilesParserParams ps;
|
|
ps.sanitize = false;
|
|
// silly ugly example which ensures disconnection, normalization, and
|
|
// reionization
|
|
std::unique_ptr<RWMol> m{
|
|
SmilesToMol("O=N(=O)-C(O[Fe])C(C(=O)O)C-N(=O)=O", ps)};
|
|
REQUIRE(m);
|
|
MolStandardize::cleanupInPlace(*m);
|
|
CHECK(MolToSmiles(*m) == "O=C([O-])C(C[N+](=O)[O-])C(O)[N+](=O)[O-].[Fe+]");
|
|
}
|
|
SECTION("disconnect organometallics") {
|
|
auto m("[CH2-](->[K+])c1ccccc1"_smiles);
|
|
TEST_ASSERT(m);
|
|
MolStandardize::disconnectOrganometallicsInPlace(*m);
|
|
TEST_ASSERT(MolToSmiles(*m) == "[CH2-]c1ccccc1.[K+]");
|
|
}
|
|
}
|
|
|
|
TEST_CASE("cleanup with multiple mols") {
|
|
SmilesParserParams ps;
|
|
ps.sanitize = false;
|
|
// silly ugly examples which ensures disconnection, normalization, and
|
|
// reionization
|
|
std::vector<std::pair<std::string, std::string>> data = {
|
|
{"O=N(=O)-C(O[Fe])C(C(=O)O)C-N(=O)=O",
|
|
"O=C([O-])C(C[N+](=O)[O-])C(O)[N+](=O)[O-].[Fe+]"},
|
|
{"O=N(=O)-CC(O[Fe])C(C(=O)O)C-N(=O)=O",
|
|
"O=C([O-])C(C[N+](=O)[O-])C(O)C[N+](=O)[O-].[Fe+]"},
|
|
{"O=N(=O)-CCC(O[Fe])C(C(=O)O)C-N(=O)=O",
|
|
"O=C([O-])C(C[N+](=O)[O-])C(O)CC[N+](=O)[O-].[Fe+]"},
|
|
};
|
|
// bulk that up a bit
|
|
for (auto iter = 0u; iter < 8; ++iter) {
|
|
auto sz = data.size();
|
|
for (auto i = 0u; i < sz; ++i) {
|
|
data.push_back(data[i]);
|
|
}
|
|
}
|
|
std::vector<std::unique_ptr<RWMol>> mols;
|
|
std::vector<RWMol *> molPtrs;
|
|
for (const auto &pr : data) {
|
|
mols.emplace_back(SmilesToMol(pr.first, ps));
|
|
REQUIRE(mols.back());
|
|
molPtrs.push_back(mols.back().get());
|
|
}
|
|
SECTION("basics") {
|
|
MolStandardize::cleanupInPlace(molPtrs);
|
|
for (auto i = 0u; i < mols.size(); ++i) {
|
|
REQUIRE(mols[i]);
|
|
CHECK(MolToSmiles(*mols[i]) == data[i].second);
|
|
}
|
|
}
|
|
#ifdef RDK_BUILD_THREADSAFE_SSS
|
|
SECTION("multithreaded") {
|
|
int numThreads = 4;
|
|
MolStandardize::cleanupInPlace(molPtrs, numThreads);
|
|
for (auto i = 0u; i < mols.size(); ++i) {
|
|
REQUIRE(mols[i]);
|
|
CHECK(MolToSmiles(*mols[i]) == data[i].second);
|
|
}
|
|
}
|
|
#endif
|
|
}
|
|
|
|
TEST_CASE("normalize with multiple mols") {
|
|
SmilesParserParams ps;
|
|
ps.sanitize = false;
|
|
std::vector<std::pair<std::string, std::string>> data = {
|
|
{"O=N(=O)-CC-N(=O)=O", "O=[N+]([O-])CC[N+](=O)[O-]"},
|
|
{"O=N(=O)-CCC-N(=O)=O", "O=[N+]([O-])CCC[N+](=O)[O-]"},
|
|
{"O=N(=O)-CCCC-N(=O)=O", "O=[N+]([O-])CCCC[N+](=O)[O-]"},
|
|
};
|
|
// bulk that up a bit
|
|
for (auto iter = 0u; iter < 8; ++iter) {
|
|
auto sz = data.size();
|
|
for (auto i = 0u; i < sz; ++i) {
|
|
data.push_back(data[i]);
|
|
}
|
|
}
|
|
std::vector<std::unique_ptr<RWMol>> mols;
|
|
std::vector<RWMol *> molPtrs;
|
|
for (const auto &pr : data) {
|
|
mols.emplace_back(SmilesToMol(pr.first, ps));
|
|
REQUIRE(mols.back());
|
|
molPtrs.push_back(mols.back().get());
|
|
}
|
|
SECTION("basics") {
|
|
MolStandardize::normalizeInPlace(molPtrs);
|
|
for (auto i = 0u; i < mols.size(); ++i) {
|
|
REQUIRE(mols[i]);
|
|
CHECK(MolToSmiles(*mols[i]) == data[i].second);
|
|
}
|
|
}
|
|
#ifdef RDK_BUILD_THREADSAFE_SSS
|
|
SECTION("multithreaded") {
|
|
int numThreads = 4;
|
|
MolStandardize::normalizeInPlace(molPtrs, numThreads);
|
|
for (auto i = 0u; i < mols.size(); ++i) {
|
|
REQUIRE(mols[i]);
|
|
CHECK(MolToSmiles(*mols[i]) == data[i].second);
|
|
}
|
|
}
|
|
#endif
|
|
}
|
|
|
|
TEST_CASE("Reionize with multiple mols") {
|
|
SmilesParserParams ps;
|
|
ps.sanitize = false;
|
|
std::vector<std::pair<std::string, std::string>> data = {
|
|
{"c1cc([O-])cc(C(=O)O)c1", "O=C([O-])c1cccc(O)c1"},
|
|
{"c1cc(C[O-])cc(C(=O)O)c1", "O=C([O-])c1cccc(CO)c1"},
|
|
{"c1cc(CC[O-])cc(C(=O)O)c1", "O=C([O-])c1cccc(CCO)c1"},
|
|
};
|
|
// bulk that up a bit
|
|
for (auto iter = 0u; iter < 8; ++iter) {
|
|
auto sz = data.size();
|
|
for (auto i = 0u; i < sz; ++i) {
|
|
data.push_back(data[i]);
|
|
}
|
|
}
|
|
std::vector<std::unique_ptr<RWMol>> mols;
|
|
std::vector<RWMol *> molPtrs;
|
|
for (const auto &pr : data) {
|
|
mols.emplace_back(SmilesToMol(pr.first, ps));
|
|
REQUIRE(mols.back());
|
|
molPtrs.push_back(mols.back().get());
|
|
}
|
|
SECTION("basics") {
|
|
MolStandardize::reionizeInPlace(molPtrs);
|
|
for (auto i = 0u; i < mols.size(); ++i) {
|
|
REQUIRE(mols[i]);
|
|
CHECK(MolToSmiles(*mols[i]) == data[i].second);
|
|
}
|
|
}
|
|
#ifdef RDK_BUILD_THREADSAFE_SSS
|
|
SECTION("multithreaded") {
|
|
int numThreads = 4;
|
|
MolStandardize::reionizeInPlace(molPtrs, numThreads);
|
|
for (auto i = 0u; i < mols.size(); ++i) {
|
|
REQUIRE(mols[i]);
|
|
CHECK(MolToSmiles(*mols[i]) == data[i].second);
|
|
}
|
|
}
|
|
#endif
|
|
}
|
|
|
|
TEST_CASE("RemoveFragments with multiple mols") {
|
|
SmilesParserParams ps;
|
|
ps.sanitize = false;
|
|
std::vector<std::pair<std::string, std::string>> data = {
|
|
{"CCCC.Cl.[Na]", "CCCC"},
|
|
{"CCCCO.Cl.[Na]", "CCCCO"},
|
|
{"CCOC.Cl.[Na]", "CCOC"},
|
|
};
|
|
// bulk that up a bit
|
|
for (auto iter = 0u; iter < 8; ++iter) {
|
|
auto sz = data.size();
|
|
for (auto i = 0u; i < sz; ++i) {
|
|
data.push_back(data[i]);
|
|
}
|
|
}
|
|
std::vector<std::unique_ptr<RWMol>> mols;
|
|
std::vector<RWMol *> molPtrs;
|
|
for (const auto &pr : data) {
|
|
mols.emplace_back(SmilesToMol(pr.first, ps));
|
|
REQUIRE(mols.back());
|
|
molPtrs.push_back(mols.back().get());
|
|
}
|
|
SECTION("basics") {
|
|
MolStandardize::removeFragmentsInPlace(molPtrs);
|
|
for (auto i = 0u; i < mols.size(); ++i) {
|
|
REQUIRE(mols[i]);
|
|
CHECK(MolToSmiles(*mols[i]) == data[i].second);
|
|
}
|
|
}
|
|
#ifdef RDK_BUILD_THREADSAFE_SSS
|
|
SECTION("multithreaded") {
|
|
int numThreads = 4;
|
|
MolStandardize::removeFragmentsInPlace(molPtrs, numThreads);
|
|
for (auto i = 0u; i < mols.size(); ++i) {
|
|
REQUIRE(mols[i]);
|
|
CHECK(MolToSmiles(*mols[i]) == data[i].second);
|
|
}
|
|
}
|
|
#endif
|
|
}
|
|
|
|
TEST_CASE("charge with multiple mols") {
|
|
auto params = MolStandardize::defaultCleanupParameters;
|
|
params.preferOrganic = true;
|
|
|
|
std::vector<std::pair<std::string, std::string>> data = {
|
|
{"O=C([O-])c1ccccc1", "O=C(O)c1ccccc1"},
|
|
{"C[NH+](C)(C).[Cl-]", "CN(C)C"},
|
|
{"[N+](=O)([O-])[O-].[CH2]", "[CH2]"},
|
|
};
|
|
// bulk that up a bit
|
|
for (auto iter = 0u; iter < 8; ++iter) {
|
|
auto sz = data.size();
|
|
for (auto i = 0u; i < sz; ++i) {
|
|
data.push_back(data[i]);
|
|
}
|
|
}
|
|
std::vector<std::unique_ptr<RWMol>> mols;
|
|
std::vector<RWMol *> molPtrs;
|
|
for (const auto &[insmi, outsmi] : data) {
|
|
mols.emplace_back(SmilesToMol(insmi));
|
|
REQUIRE(mols.back());
|
|
molPtrs.push_back(mols.back().get());
|
|
}
|
|
SECTION("basics") {
|
|
int numThreads = 1;
|
|
MolStandardize::chargeParentInPlace(molPtrs, numThreads, params);
|
|
for (auto i = 0u; i < mols.size(); ++i) {
|
|
REQUIRE(mols[i]);
|
|
CHECK(MolToSmiles(*mols[i]) == data[i].second);
|
|
}
|
|
}
|
|
#ifdef RDK_BUILD_THREADSAFE_SSS
|
|
SECTION("multithreaded") {
|
|
int numThreads = 4;
|
|
MolStandardize::chargeParentInPlace(molPtrs, numThreads, params);
|
|
for (auto i = 0u; i < mols.size(); ++i) {
|
|
REQUIRE(mols[i]);
|
|
CHECK(MolToSmiles(*mols[i]) == data[i].second);
|
|
}
|
|
}
|
|
#endif
|
|
}
|
|
|
|
TEST_CASE("isotope with multiple mols") {
|
|
auto params = MolStandardize::defaultCleanupParameters;
|
|
|
|
std::vector<std::pair<std::string, std::string>> data = {
|
|
{"[13CH3]C", "CC"},
|
|
{"[13CH3]C.C", "C.CC"},
|
|
{"[13CH3][12CH3]", "CC"},
|
|
};
|
|
// bulk that up a bit
|
|
for (auto iter = 0u; iter < 8; ++iter) {
|
|
auto sz = data.size();
|
|
for (auto i = 0u; i < sz; ++i) {
|
|
data.push_back(data[i]);
|
|
}
|
|
}
|
|
std::vector<std::unique_ptr<RWMol>> mols;
|
|
std::vector<RWMol *> molPtrs;
|
|
for (const auto &[insmi, outsmi] : data) {
|
|
mols.emplace_back(SmilesToMol(insmi));
|
|
REQUIRE(mols.back());
|
|
molPtrs.push_back(mols.back().get());
|
|
}
|
|
SECTION("basics") {
|
|
int numThreads = 1;
|
|
MolStandardize::isotopeParentInPlace(molPtrs, numThreads, params);
|
|
for (auto i = 0u; i < mols.size(); ++i) {
|
|
REQUIRE(mols[i]);
|
|
CHECK(MolToSmiles(*mols[i]) == data[i].second);
|
|
}
|
|
}
|
|
#ifdef RDK_BUILD_THREADSAFE_SSS
|
|
SECTION("multithreaded") {
|
|
int numThreads = 4;
|
|
MolStandardize::isotopeParentInPlace(molPtrs, numThreads, params);
|
|
for (auto i = 0u; i < mols.size(); ++i) {
|
|
REQUIRE(mols[i]);
|
|
CHECK(MolToSmiles(*mols[i]) == data[i].second);
|
|
}
|
|
}
|
|
#endif
|
|
}
|
|
|
|
TEST_CASE("fragments with multiple mols") {
|
|
auto params = MolStandardize::defaultCleanupParameters;
|
|
params.preferOrganic = true;
|
|
|
|
std::vector<std::pair<std::string, std::string>> data = {
|
|
{"O=C([O-])c1ccccc1", "O=C([O-])c1ccccc1"},
|
|
{"C[NH+](C)(C).[Cl-]", "C[NH+](C)C"},
|
|
{"[N+](=O)([O-])[O-].CC", "CC"},
|
|
};
|
|
// bulk that up a bit
|
|
for (auto iter = 0u; iter < 8; ++iter) {
|
|
auto sz = data.size();
|
|
for (auto i = 0u; i < sz; ++i) {
|
|
data.push_back(data[i]);
|
|
}
|
|
}
|
|
std::vector<std::unique_ptr<RWMol>> mols;
|
|
std::vector<RWMol *> molPtrs;
|
|
for (const auto &[insmi, outsmi] : data) {
|
|
mols.emplace_back(SmilesToMol(insmi));
|
|
REQUIRE(mols.back());
|
|
molPtrs.push_back(mols.back().get());
|
|
}
|
|
SECTION("basics") {
|
|
int numThreads = 1;
|
|
MolStandardize::fragmentParentInPlace(molPtrs, numThreads, params);
|
|
for (auto i = 0u; i < mols.size(); ++i) {
|
|
REQUIRE(mols[i]);
|
|
CHECK(MolToSmiles(*mols[i]) == data[i].second);
|
|
}
|
|
}
|
|
#ifdef RDK_BUILD_THREADSAFE_SSS
|
|
SECTION("multithreaded") {
|
|
int numThreads = 4;
|
|
MolStandardize::fragmentParentInPlace(molPtrs, numThreads, params);
|
|
for (auto i = 0u; i < mols.size(); ++i) {
|
|
REQUIRE(mols[i]);
|
|
CHECK(MolToSmiles(*mols[i]) == data[i].second);
|
|
}
|
|
}
|
|
#endif
|
|
}
|
|
|
|
TEST_CASE("stereo with multiple mols") {
|
|
auto params = MolStandardize::defaultCleanupParameters;
|
|
|
|
std::vector<std::pair<std::string, std::string>> data = {
|
|
{"F[C@H](O)Cl", "OC(F)Cl"},
|
|
{"F[C@H](CCO)Cl", "OCCC(F)Cl"},
|
|
{"F[C@H](CCO)Cl.F[C@H](O)Cl", "OC(F)Cl.OCCC(F)Cl"},
|
|
};
|
|
// bulk that up a bit
|
|
for (auto iter = 0u; iter < 8; ++iter) {
|
|
auto sz = data.size();
|
|
for (auto i = 0u; i < sz; ++i) {
|
|
data.push_back(data[i]);
|
|
}
|
|
}
|
|
std::vector<std::unique_ptr<RWMol>> mols;
|
|
std::vector<RWMol *> molPtrs;
|
|
for (const auto &[insmi, outsmi] : data) {
|
|
mols.emplace_back(SmilesToMol(insmi));
|
|
REQUIRE(mols.back());
|
|
molPtrs.push_back(mols.back().get());
|
|
}
|
|
SECTION("basics") {
|
|
int numThreads = 1;
|
|
MolStandardize::stereoParentInPlace(molPtrs, numThreads, params);
|
|
for (auto i = 0u; i < mols.size(); ++i) {
|
|
REQUIRE(mols[i]);
|
|
CHECK(MolToSmiles(*mols[i]) == data[i].second);
|
|
}
|
|
}
|
|
#ifdef RDK_BUILD_THREADSAFE_SSS
|
|
SECTION("multithreaded") {
|
|
int numThreads = 4;
|
|
MolStandardize::stereoParentInPlace(molPtrs, numThreads, params);
|
|
for (auto i = 0u; i < mols.size(); ++i) {
|
|
REQUIRE(mols[i]);
|
|
CHECK(MolToSmiles(*mols[i]) == data[i].second);
|
|
}
|
|
}
|
|
#endif
|
|
}
|
|
|
|
TEST_CASE("tautomerParent with multiple mols") {
|
|
auto params = MolStandardize::defaultCleanupParameters;
|
|
|
|
std::vector<std::pair<std::string, std::string>> data = {
|
|
{"[O-]c1ccc(C(=O)O)cc1CC=CO", "O=CCCc1cc(C(=O)[O-])ccc1O"},
|
|
{"[O-]c1ccc(C(=O)O)cc1CC=CO.[Na+]", "O=CCCc1cc(C(=O)[O-])ccc1O.[Na+]"},
|
|
{"[O-]c1ccc(C(=O)O)cc1C[13CH]=CO", "O=C[13CH2]Cc1cc(C(=O)[O-])ccc1O"},
|
|
};
|
|
// bulk that up a bit
|
|
for (auto iter = 0u; iter < 5; ++iter) {
|
|
auto sz = data.size();
|
|
for (auto i = 0u; i < sz; ++i) {
|
|
data.push_back(data[i]);
|
|
}
|
|
}
|
|
std::vector<std::unique_ptr<RWMol>> mols;
|
|
std::vector<RWMol *> molPtrs;
|
|
for (const auto &[insmi, outsmi] : data) {
|
|
mols.emplace_back(SmilesToMol(insmi));
|
|
REQUIRE(mols.back());
|
|
molPtrs.push_back(mols.back().get());
|
|
}
|
|
SECTION("basics") {
|
|
int numThreads = 1;
|
|
MolStandardize::tautomerParentInPlace(molPtrs, numThreads, params);
|
|
for (auto i = 0u; i < mols.size(); ++i) {
|
|
REQUIRE(mols[i]);
|
|
CHECK(MolToSmiles(*mols[i]) == data[i].second);
|
|
}
|
|
}
|
|
#ifdef RDK_BUILD_THREADSAFE_SSS
|
|
SECTION("multithreaded") {
|
|
int numThreads = 4;
|
|
MolStandardize::tautomerParentInPlace(molPtrs, numThreads, params);
|
|
for (auto i = 0u; i < mols.size(); ++i) {
|
|
REQUIRE(mols[i]);
|
|
CHECK(MolToSmiles(*mols[i]) == data[i].second);
|
|
}
|
|
}
|
|
#endif
|
|
}
|
|
|
|
TEST_CASE("superParent with multiple mols") {
|
|
auto params = MolStandardize::defaultCleanupParameters;
|
|
|
|
std::vector<std::pair<std::string, std::string>> data = {
|
|
{"[O-]c1ccc(C(=O)O)cc1CC=CO", "O=CCCc1cc(C(=O)O)ccc1O"},
|
|
{"[O-]c1ccc(C(=O)O)cc1CC=CO.[Na+]", "O=CCCc1cc(C(=O)O)ccc1O"},
|
|
{"[O-]c1ccc(C(=O)O)cc1C[13CH]=CO", "O=CCCc1cc(C(=O)O)ccc1O"},
|
|
};
|
|
// bulk that up a bit
|
|
for (auto iter = 0u; iter < 5; ++iter) {
|
|
auto sz = data.size();
|
|
for (auto i = 0u; i < sz; ++i) {
|
|
data.push_back(data[i]);
|
|
}
|
|
}
|
|
std::vector<std::unique_ptr<RWMol>> mols;
|
|
std::vector<RWMol *> molPtrs;
|
|
for (const auto &[insmi, outsmi] : data) {
|
|
mols.emplace_back(SmilesToMol(insmi));
|
|
REQUIRE(mols.back());
|
|
molPtrs.push_back(mols.back().get());
|
|
}
|
|
SECTION("basics") {
|
|
int numThreads = 1;
|
|
MolStandardize::superParentInPlace(molPtrs, numThreads, params);
|
|
for (auto i = 0u; i < mols.size(); ++i) {
|
|
REQUIRE(mols[i]);
|
|
CHECK(MolToSmiles(*mols[i]) == data[i].second);
|
|
}
|
|
}
|
|
#ifdef RDK_BUILD_THREADSAFE_SSS
|
|
SECTION("multithreaded") {
|
|
int numThreads = 4;
|
|
MolStandardize::superParentInPlace(molPtrs, numThreads, params);
|
|
for (auto i = 0u; i < mols.size(); ++i) {
|
|
REQUIRE(mols[i]);
|
|
CHECK(MolToSmiles(*mols[i]) == data[i].second);
|
|
}
|
|
}
|
|
#endif
|
|
}
|
|
|
|
TEST_CASE(
|
|
"github #7642: Multithreaded InPlace standardization functions seg fault if there's a duplicate molecule") {
|
|
auto mol = "CC"_smiles;
|
|
REQUIRE(mol);
|
|
std::vector<RWMol *> mols{mol.get(), mol.get()};
|
|
int numThreads = 1;
|
|
CHECK_THROWS_AS(MolStandardize::cleanupInPlace(mols, numThreads),
|
|
ValueErrorException);
|
|
CHECK_THROWS_AS(MolStandardize::normalizeInPlace(mols, numThreads),
|
|
ValueErrorException);
|
|
CHECK_THROWS_AS(MolStandardize::reionizeInPlace(mols, numThreads),
|
|
ValueErrorException);
|
|
CHECK_THROWS_AS(MolStandardize::removeFragmentsInPlace(mols, numThreads),
|
|
ValueErrorException);
|
|
CHECK_THROWS_AS(MolStandardize::tautomerParentInPlace(mols, numThreads),
|
|
ValueErrorException);
|
|
CHECK_THROWS_AS(MolStandardize::fragmentParentInPlace(mols, numThreads),
|
|
ValueErrorException);
|
|
CHECK_THROWS_AS(MolStandardize::stereoParentInPlace(mols, numThreads),
|
|
ValueErrorException);
|
|
CHECK_THROWS_AS(MolStandardize::isotopeParentInPlace(mols, numThreads),
|
|
ValueErrorException);
|
|
CHECK_THROWS_AS(MolStandardize::chargeParentInPlace(mols, numThreads),
|
|
ValueErrorException);
|
|
CHECK_THROWS_AS(MolStandardize::superParentInPlace(mols, numThreads),
|
|
ValueErrorException);
|
|
}
|
|
|
|
TEST_CASE("github #7689 RDKitValidation does not catch some valence issues") {
|
|
SECTION("basics") {
|
|
std::string mb = R"CTAB(foo
|
|
MJ240300
|
|
|
|
2 1 0 0 0 0 0 0 0 0999 V2000
|
|
-4.8993 1.8410 0.0000 Br 0 5 0 0 0 0 0 0 0 0 0 0
|
|
-5.6138 1.4285 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
|
|
2 1 1 0 0 0 0
|
|
M END)CTAB";
|
|
v2::FileParsers::MolFileParserParams ps;
|
|
ps.sanitize = false;
|
|
auto mol = v2::FileParsers::MolFromMolBlock(mb, ps);
|
|
REQUIRE(mol);
|
|
MolStandardize::RDKitValidation validator;
|
|
auto res = validator.validate(*mol, true);
|
|
REQUIRE(res.size() == 1);
|
|
CHECK(res[0].find(
|
|
"INFO: [ValenceValidation] Explicit valence for atom # 0 Br") ==
|
|
0);
|
|
}
|
|
}
|
|
|
|
TEST_CASE("Custom Scoring Functions") {
|
|
SECTION("basics") {
|
|
auto mol = "CC\\C=C(/O)[C@@H](C)C(C)=O"_smiles;
|
|
REQUIRE(MolStandardize::TautomerScoringFunctions::scoreRings(*mol) == 0);
|
|
REQUIRE(MolStandardize::TautomerScoringFunctions::scoreHeteroHs(*mol) == 0);
|
|
REQUIRE(MolStandardize::TautomerScoringFunctions::scoreSubstructs(*mol) ==
|
|
6);
|
|
|
|
auto terms = MolStandardize::TautomerScoringFunctions::
|
|
getDefaultTautomerScoreSubstructs();
|
|
REQUIRE(terms.size() == 12);
|
|
}
|
|
|
|
SECTION("Override default tautomer scoring functions") {
|
|
auto mol = "CC\\C=C(/O)[C@@H](C)C(C)=O"_smiles;
|
|
std::vector<MolStandardize::TautomerScoringFunctions::SubstructTerm> terms =
|
|
{{"C=O", "[#6]=,:[#8]", 1000}};
|
|
REQUIRE(MolStandardize::TautomerScoringFunctions::scoreSubstructs(
|
|
*mol, terms) == 1000);
|
|
}
|
|
}
|
|
|
|
TEST_CASE("tautomer canonicalize preserves conformers") {
|
|
// Regression test: quickCopy during tautomer enumeration drops
|
|
// conformers. canonicalize() must restore them from the original
|
|
// molecule so that downstream code (e.g. InChI generation) that
|
|
// relies on 2D/3D coordinates works correctly.
|
|
std::string molblock = R"CTAB(
|
|
ChemDraw02102613032D
|
|
|
|
0 0 0 0 0 0 V3000
|
|
M V30 BEGIN CTAB
|
|
M V30 COUNTS 17 18 0 0 0
|
|
M V30 BEGIN ATOM
|
|
M V30 1 C -1.382449 2.541459 0.000000 0
|
|
M V30 2 N -1.391616 1.716459 0.000000 0
|
|
M V30 3 C -2.139845 1.368125 0.000000 0
|
|
M V30 4 C -2.333490 0.560313 0.000000 0
|
|
M V30 5 C -1.822449 -0.093386 0.000000 0
|
|
M V30 6 C -0.992865 -0.099688 0.000000 0
|
|
M V30 7 C -0.468073 0.540833 0.000000 0
|
|
M V30 8 C -0.646824 1.348646 0.000000 0
|
|
M V30 9 O 0.002291 1.857397 0.000000 0
|
|
M V30 10 N 0.334583 0.349479 0.000000 0
|
|
M V30 11 C 0.570052 -0.441719 0.000000 0
|
|
M V30 12 O 0.003437 -1.040990 0.000000 0
|
|
M V30 13 C 1.372708 -0.633073 0.000000 0
|
|
M V30 14 C 1.608177 -1.423699 0.000000 0
|
|
M V30 15 C 2.333490 -1.816147 0.000000 0
|
|
M V30 16 C 1.941043 -2.541459 0.000000 0
|
|
M V30 17 C 1.215730 -2.149012 0.000000 0
|
|
M V30 END ATOM
|
|
M V30 BEGIN BOND
|
|
M V30 1 1 1 2
|
|
M V30 2 1 2 8
|
|
M V30 3 1 2 3
|
|
M V30 4 1 3 4
|
|
M V30 5 1 4 5
|
|
M V30 6 1 5 6
|
|
M V30 7 1 6 7
|
|
M V30 8 1 7 8
|
|
M V30 9 2 8 9
|
|
M V30 10 1 7 10
|
|
M V30 11 1 10 11
|
|
M V30 12 2 11 12
|
|
M V30 13 1 11 13
|
|
M V30 14 2 13 14
|
|
M V30 15 1 14 17
|
|
M V30 16 1 14 15
|
|
M V30 17 1 15 16
|
|
M V30 18 1 16 17
|
|
M V30 END BOND
|
|
M V30 END CTAB
|
|
M END
|
|
)CTAB";
|
|
std::unique_ptr<RWMol> mol(MolBlockToMol(molblock));
|
|
REQUIRE(mol);
|
|
REQUIRE(mol->getNumConformers() == 1);
|
|
|
|
MolStandardize::CleanupParameters params;
|
|
params.tautomerRemoveBondStereo = false;
|
|
params.tautomerRemoveSp3Stereo = false;
|
|
MolStandardize::TautomerEnumerator te(params);
|
|
std::unique_ptr<ROMol> canon{te.canonicalize(*mol)};
|
|
REQUIRE(canon);
|
|
|
|
// Conformer must be preserved (quickCopy regression)
|
|
CHECK(canon->getNumConformers() == 1);
|
|
if (canon->getNumConformers() > 0) {
|
|
const auto &origConf = mol->getConformer(0);
|
|
const auto &canonConf = canon->getConformer(0);
|
|
CHECK(origConf.getNumAtoms() == canonConf.getNumAtoms());
|
|
for (unsigned int i = 0; i < origConf.getNumAtoms(); ++i) {
|
|
auto origPos = origConf.getAtomPos(i);
|
|
auto canonPos = canonConf.getAtomPos(i);
|
|
CHECK(origPos.x == Catch::Approx(canonPos.x).epsilon(1e-4));
|
|
CHECK(origPos.y == Catch::Approx(canonPos.y).epsilon(1e-4));
|
|
}
|
|
}
|
|
}
|
|
|