Files
rdkit/Code/GraphMol/Substruct/testSubstructMatch.cpp
Yakov Pechersky 0d886b9d08 Speed-up tautomer canonicalization, no API changes (#9134)
* Speed up tautomer canonicalization by deferring on SSSR calc

* Lazy kekulization for tautomer enumeration

Defer kekulization of tautomers until they are actually needed for
transform matching. This avoids creating kekulized copies for:
1. The initial tautomer (until first iteration)
2. New tautomers that may never be processed (if enumeration ends early)

The Tautomer class now supports lazy initialization of the kekulized
form via getKekulized() method.

Performance improvement: ~7% additional speedup (total ~22-24% from baseline)

* Use count-only substructure matching in tautomer scoring

* Add SubstructMatchCount regression test

* MolStandardize: reduce enumerate overhead

* MolStandardize: avoid per-tautomer ring recomputation

* Atom: cache PeriodicTable pointer in valence calcs

* Atom: reuse PeriodicTable in getEffectiveAtomicNum

* PeriodicTable: add atomic fast path for getTable

* GraphMol: reduce ROMol copy reallocations

* MolStandardize: use quickCopy for per-match product copies

Use RWMol(*kmol, true) in tautomer enumeration to avoid copying properties/bookmarks/conformers for each candidate. This reduces deep-copy overhead without changing chemistry.

* MolStandardize: pre-filter scoring patterns by element/connectivity

For tautomer scoring, pre-compute which SubstructTerms are relevant for
a given input molecule. Since tautomerization only moves H atoms and
changes bond orders (never creates/destroys heavy-atom bonds), patterns
requiring missing elements or connectivity can be skipped for all
tautomers of that molecule.

Two-stage filtering:
1. Element check: skip patterns requiring atoms not in the molecule
2. Connectivity check: skip patterns whose bond-order-agnostic structure
   doesn't match the input molecule's connectivity

This reduces the number of VF2 substructure calls per tautomer from 12
to typically 3-5, depending on the molecule's composition.

* MolStandardize: preserve molecule properties for canonical tautomer

Copy molecule properties from the original input to the canonical tautomer
result. Since quickCopy during enumeration skips d_props to avoid overhead,
extended SMILES data like link nodes (LN) was lost. This restores them
on the final result.

* TautomerQuery: preserve molecule properties (e.g. link nodes) in tautomers

TautomerQuery::fromMol() uses TautomerEnumerator::enumerate() which uses
quickCopy for performance. This doesn't copy molecule properties like
_molLinkNodes. Without this fix, XQMol output would lose link node
extensions in the SMILES.

Copy properties from the original query molecule to all enumerated
tautomers before constructing the TautomerQuery. This preserves extended
SMILES data without impacting enumeration performance.

* MolStandardize: use parallel iteration and cache bond lookups

Replace O(n) getAtomWithIdx/getBondWithIdx calls with parallel iteration
over atom/bond ranges in canonicalizeInPlace and enumerate. Cache bond
lookups in setTautomerStereoAndIsoHs to avoid repeated O(n) searches.

* perf: add specialized matchers for simple tautomer scoring patterns

Replace VF2 graph matching with O(n) loops for 6 simple patterns:
- countDoubleOrAromaticBonds: C=O, N=O, P=O patterns
- countMethyls: [CX4H3] methyl groups
- countCarbonDoubleHetero: [C]=[/home/dcvuser/rdkit;Code/GraphMol/MolStandardize/Tautomer.h] aliphatic C=hetero
- countAromaticCarbonExocyclicN: [c]=aromatic C=exocyclic N
Complex patterns (benzoquinone, oxim, guanidine, aci-nitro) still use VF2.
Combined with the pre-filtering optimization, this achieves ~3.7x speedup
(~2500ms vs ~9300ms original) for tautomer canonicalization.

* Fix tautomer canonicalize dropping conformers from quickCopy

quickCopy (RWMol(*mol, true)) skips conformers, so tautomer
enumeration products lose 2D/3D coordinates. This causes InChI
generation to omit the /b (double bond E/Z stereo) layer, since
E/Z is derived from atomic coordinates.

Fix: copy conformers from the original molecule onto the canonical
tautomer after pickCanonical in TautomerEnumerator::canonicalize().

Tests: SMILES-based E/Z check in testTautomer.cpp, molblock-based
conformer preservation check in catch_tests.cpp.

* add test on canonicalize losing stereo

* add regression test for exocyclic C=C tautomer canonicalization

The getTautomerStateKey() pre-filter (commit 2595ef748) can falsely
deduplicate distinct tautomers when their atom-index-ordered state
patterns happen to match, leading canonicalize() to pick the wrong
canonical form for molecules with STEREOTRANS-pinned exocyclic C=C
bonds after RemoveHs.

Test verifies that O=C(CC1=CC2=CC=COC2)NC1=O canonicalizes to the
exocyclic form O=C1CC(=CC2=CC=COC2)C(=O)N1, not the endocyclic form
O=C1C=C(C=C2CC=COC2)C(=O)N1.

Currently expected to FAIL until the state key dedup bug is fixed.

* MolStandardize: expand tautomer connectivity SMARTS

* MolStandardize: scope tautomer pattern enum

* MolStandardize: trim tautomer pattern enum

* MolStandardize: use symmetric ring scoring
2026-04-30 14:17:18 +02:00

1757 lines
62 KiB
C++

//
// Copyright (C) 2001-2025 Greg Landrum and other RDKit contributors
//
// @@ All Rights Reserved @@
// This file is part of the RDKit.
// The contents are covered by the terms of the BSD license
// which is included in the file license.txt, found at the root
// of the RDKit source tree.
//
//
#include <catch2/catch_all.hpp>
// RD bits
#include <GraphMol/RDKitBase.h>
#include <GraphMol/RDKitQueries.h>
#include <GraphMol/Chirality.h>
#include "SubstructMatch.h"
#include "SubstructUtils.h"
#include <GraphMol/SmilesParse/SmilesParse.h>
#include <GraphMol/SmilesParse/SmilesWrite.h>
#include <GraphMol/SmilesParse/SmartsWrite.h>
#include <GraphMol/FileParsers/FileParsers.h>
#include <GraphMol/FileParsers/MolSupplier.h>
#include <GraphMol/test_fixtures.h>
#include "vf2.hpp"
using namespace RDKit;
TEST_CASE("test1", "[substruct]") {
bool updateLabel = true;
bool takeOwnership = true;
std::unique_ptr<RWMol> m = std::make_unique<RWMol>();
m->addAtom(new Atom(8), updateLabel, takeOwnership);
m->addAtom(new Atom(6), updateLabel, takeOwnership);
m->addAtom(new Atom(6), updateLabel, takeOwnership);
m->addBond(0, 1, Bond::SINGLE);
m->addBond(1, 2, Bond::SINGLE);
std::unique_ptr<RWMol> q1 = std::make_unique<RWMol>();
q1->addAtom(new QueryAtom(6), updateLabel, takeOwnership);
q1->addAtom(new QueryAtom(6), updateLabel, takeOwnership);
q1->addBond(0, 1, Bond::SINGLE);
std::vector<MatchVectType> matches;
auto n = SubstructMatch(*m, *q1, matches, false);
REQUIRE(n == 2);
REQUIRE(matches.size() == n);
REQUIRE(matches[0].size() == 2);
CHECK(matches[0][0].first == 0);
CHECK((matches[0][0].second == 1 || matches[0][0].second == 2));
CHECK(matches[0][1].first == 1);
CHECK(matches[0][1].second != matches[0][0].second);
CHECK((matches[1][1].second == 1 || matches[0][1].second == 2));
CHECK(matches[1][0].first == 0);
CHECK((matches[1][0].second == 1 || matches[1][0].second == 2));
CHECK(matches[1][0].second != matches[0][0].second);
CHECK(matches[1][0].second == matches[0][1].second);
CHECK(matches[1][1].first == 1);
CHECK(matches[1][1].second != matches[1][0].second);
CHECK(matches[1][1].second == matches[0][0].second);
n = SubstructMatch(*m, *q1, matches, true);
REQUIRE(n == 1);
REQUIRE(matches.size() == n);
REQUIRE(matches[0].size() == 2);
CHECK(matches[0][0].first == 0);
CHECK((matches[0][0].second == 1 || matches[0][0].second == 2));
CHECK(matches[0][1].first == 1);
CHECK(matches[0][1].second != matches[0][0].second);
CHECK((matches[0][1].second == 1 || matches[0][1].second == 2));
MatchVectType matchV;
REQUIRE(SubstructMatch(*m, *q1, matchV));
REQUIRE(matchV.size() == 2);
// make sure we reset the match vectors.
// build a query we won't match:
q1->addAtom(new QueryAtom(6), updateLabel, takeOwnership);
q1->addBond(1, 2, Bond::SINGLE);
q1->addAtom(new QueryAtom(6), updateLabel, takeOwnership);
q1->addBond(2, 3, Bond::SINGLE);
CHECK(!SubstructMatch(*m, *q1, matchV));
CHECK(matchV.size() == 0);
n = SubstructMatch(*m, *q1, matches, false);
CHECK(n == 0);
CHECK(matches.size() == 0);
}
TEST_CASE("test2", "[substruct]") {
std::unique_ptr<RWMol> m = std::make_unique<RWMol>();
bool updateLabel = true;
bool takeOwnership = true;
m->addAtom(new Atom(6), updateLabel, takeOwnership);
m->addAtom(new Atom(6), updateLabel, takeOwnership);
m->addAtom(new Atom(8), updateLabel, takeOwnership);
m->addBond(0, 1, Bond::SINGLE);
m->addBond(1, 2, Bond::SINGLE);
std::unique_ptr<RWMol> q1 = std::make_unique<RWMol>();
q1->addAtom(new QueryAtom(6), updateLabel, takeOwnership);
q1->addAtom(new QueryAtom(8), updateLabel, takeOwnership);
q1->addBond(0, 1, Bond::SINGLE);
MatchVectType matchV;
auto n = SubstructMatch(*m, *q1, matchV);
CHECK(n);
CHECK(matchV.size() == 2);
CHECK(matchV[0].first == 0);
CHECK(matchV[0].second == 1);
CHECK(matchV[1].first == 1);
CHECK(matchV[1].second == 2);
std::vector<MatchVectType> matches;
n = SubstructMatch(*m, *q1, matches, false);
REQUIRE(n == 1);
REQUIRE(matches.size() == n);
REQUIRE(matches[0].size() == 2);
n = SubstructMatch(*m, *q1, matches, true);
REQUIRE(n == 1);
REQUIRE(matches.size() == n);
REQUIRE(matches[0].size() == 2);
REQUIRE(SubstructMatch(*m, *q1, matchV));
REQUIRE(matchV.size() == 2);
m.reset(new RWMol());
m->addAtom(new Atom(6), updateLabel, takeOwnership);
m->addAtom(new Atom(6), updateLabel, takeOwnership);
m->addAtom(new Atom(8), updateLabel, takeOwnership);
m->addBond(0, 1, Bond::SINGLE);
m->addBond(1, 2, Bond::DOUBLE);
matches.clear();
n = SubstructMatch(*m, *q1, matches, false);
REQUIRE(n == 0);
REQUIRE(matches.size() == n);
n = SubstructMatch(*m, *q1, matches, true);
REQUIRE(n == 0);
REQUIRE(matches.size() == n);
REQUIRE(!SubstructMatch(*m, *q1, matchV));
}
TEST_CASE("test3", "[substruct]") {
std::unique_ptr<RWMol> m = std::make_unique<RWMol>();
bool updateLabel = true;
bool takeOwnership = true;
m->addAtom(new Atom(6), updateLabel, takeOwnership);
m->addAtom(new Atom(6), updateLabel, takeOwnership);
m->addAtom(new Atom(8), updateLabel, takeOwnership);
m->addBond(0, 1, Bond::SINGLE);
m->addBond(1, 2, Bond::SINGLE);
std::unique_ptr<RWMol> q1 = std::make_unique<RWMol>();
q1->addAtom(new QueryAtom(6), updateLabel, takeOwnership);
q1->addAtom(new QueryAtom(8), updateLabel, takeOwnership);
q1->addBond(0, 1, Bond::UNSPECIFIED);
std::vector<MatchVectType> matches;
auto n = SubstructMatch(*m, *q1, matches, false);
REQUIRE(n == 1);
REQUIRE(matches.size() == n);
REQUIRE(matches[0].size() == 2);
n = SubstructMatch(*m, *q1, matches, true);
REQUIRE(n == 1);
REQUIRE(matches.size() == n);
REQUIRE(matches[0].size() == 2);
MatchVectType matchV;
REQUIRE(SubstructMatch(*m, *q1, matchV));
REQUIRE(matchV.size() == 2);
m = std::make_unique<RWMol>();
m->addAtom(new Atom(6), updateLabel, takeOwnership);
m->addAtom(new Atom(6), updateLabel, takeOwnership);
m->addAtom(new Atom(8), updateLabel, takeOwnership);
m->addBond(0, 1, Bond::SINGLE);
m->addBond(1, 2, Bond::DOUBLE);
matches.clear();
n = SubstructMatch(*m, *q1, matches, false);
REQUIRE(n == 1);
REQUIRE(matches.size() == n);
REQUIRE(matches[0].size() == 2);
n = SubstructMatch(*m, *q1, matches, true);
REQUIRE(n == 1);
REQUIRE(matches.size() == n);
REQUIRE(matches[0].size() == 2);
REQUIRE(SubstructMatch(*m, *q1, matchV));
REQUIRE(matchV.size() == 2);
q1 = std::make_unique<RWMol>();
q1->addAtom(new QueryAtom(6), updateLabel, takeOwnership);
q1->addAtom(new QueryAtom(6), updateLabel, takeOwnership);
q1->addBond(0, 1, Bond::UNSPECIFIED);
n = SubstructMatch(*m, *q1, matches, false);
CHECK(n == 2);
CHECK(matches.size() == n);
CHECK(matches[0].size() == 2);
CHECK(matches[1].size() == 2);
CHECK(matches[0][0].second != matches[1][0].second);
CHECK(matches[0][1].second != matches[1][1].second);
n = SubstructMatch(*m, *q1, matches, true);
CHECK(n == 1);
CHECK(matches.size() == n);
}
TEST_CASE("test4", "[substruct]") {
bool updateLabel = true;
bool takeOwnership = true;
std::unique_ptr<Atom> a6 = std::make_unique<Atom>(6);
std::unique_ptr<Atom> a8 = std::make_unique<Atom>(8);
std::unique_ptr<RWMol> m = std::make_unique<RWMol>();
m->addAtom(a6.get());
m->addAtom(a6.get());
m->addAtom(a8.get());
m->addAtom(a6.get());
m->addAtom(a6.get());
m->addBond(1, 0, Bond::SINGLE);
m->addBond(1, 2, Bond::SINGLE);
m->addBond(1, 3, Bond::SINGLE);
m->addBond(2, 4, Bond::SINGLE);
// this will be the recursive query
std::unique_ptr<RWMol> q1 = std::make_unique<RWMol>();
q1->addAtom(new QueryAtom(6), updateLabel, takeOwnership);
q1->addAtom(new QueryAtom(8), updateLabel, takeOwnership);
q1->addBond(0, 1, Bond::UNSPECIFIED);
// here's the main query
std::unique_ptr<RWMol> q2 = std::make_unique<RWMol>();
auto *rsq = new RecursiveStructureQuery(q1.release());
auto *qA = new QueryAtom(6);
qA->expandQuery(rsq, Queries::COMPOSITE_AND);
// std::cout << "post expand: " << qA->getQuery() << std::endl;
q2->addAtom(qA, true, true);
// std::cout << "mol: " << q2->getAtomWithIdx(0)->getQuery() << std::endl;
q2->addAtom(new QueryAtom(6), true, true);
q2->addBond(0, 1, Bond::UNSPECIFIED);
MatchVectType matchV;
bool found = SubstructMatch(*m, *q2, matchV);
REQUIRE(found);
REQUIRE(matchV.size() == 2);
CHECK(matchV[0].first == 0);
CHECK(matchV[0].second == 1);
CHECK(matchV[1].first == 1);
CHECK((matchV[1].second == 0 || matchV[1].second == 3));
std::vector<MatchVectType> matches;
auto n = SubstructMatch(*m, *q2, matches, true);
CHECK(n == 2);
CHECK(matches.size() == (size_t)n);
CHECK(matches[0].size() == 2);
CHECK(matches[1].size() == 2);
CHECK(matches[0][0].second == matches[1][0].second);
CHECK(matches[0][1].second != matches[1][1].second);
}
TEST_CASE("test5", "[substruct]") {
auto a6 = std::make_unique<Atom>(6);
auto a8 = std::make_unique<Atom>(8);
// CC(OC)C
auto m = std::make_unique<RWMol>();
m->addAtom(a6.get());
m->addAtom(a6.get());
m->addAtom(a8.get());
m->addAtom(a6.get());
m->addAtom(a6.get());
m->addBond(0, 1, Bond::SINGLE);
m->addBond(1, 2, Bond::SINGLE);
m->addBond(1, 4, Bond::SINGLE);
m->addBond(2, 3, Bond::SINGLE);
// this will be the recursive query
bool updateLabel = true;
bool takeOwnership = true;
auto q1 = std::make_unique<RWMol>();
q1->addAtom(new QueryAtom(6), updateLabel, takeOwnership);
q1->addAtom(new QueryAtom(8), updateLabel, takeOwnership);
q1->addBond(0, 1, Bond::UNSPECIFIED);
// here's the main query
auto q2 = std::make_unique<RWMol>();
auto qA = std::make_unique<QueryAtom>();
auto *rsq = new RecursiveStructureQuery(q1.release());
qA->setQuery(rsq);
q2->addAtom(qA.release(), true, true);
q2->addAtom(new QueryAtom(6), true, true);
q2->addBond(0, 1, Bond::UNSPECIFIED);
MatchVectType matchV;
bool found = SubstructMatch(*m, *q2, matchV);
REQUIRE(found);
REQUIRE(matchV.size() == 2);
std::vector<MatchVectType> matches;
auto n = SubstructMatch(*m, *q2, matches, true);
REQUIRE(n == 2);
REQUIRE(matches[0].size() == 2);
}
TEST_CASE("test5QueryRoot", "[substruct]") {
auto a6 = std::unique_ptr<Atom>(new Atom(6));
auto a8 = std::unique_ptr<Atom>(new Atom(8));
// CC(OC)C
auto m = std::make_unique<RWMol>();
m->addAtom(a6.get());
m->addAtom(a6.get());
m->addAtom(a8.get());
m->addAtom(a6.get());
m->addAtom(a6.get());
m->addBond(0, 1, Bond::SINGLE);
m->addBond(1, 2, Bond::SINGLE);
m->addBond(1, 4, Bond::SINGLE);
m->addBond(2, 3, Bond::SINGLE);
// this will be the recursive query
bool updateLabel = true;
bool takeOwnership = true;
auto q1 = std::make_unique<RWMol>();
q1->addAtom(new QueryAtom(8), updateLabel, takeOwnership);
q1->addAtom(new QueryAtom(6), updateLabel, takeOwnership);
q1->addBond(0, 1, Bond::UNSPECIFIED);
q1->setProp(common_properties::_queryRootAtom, 1);
// here's the main query
auto q2 = std::make_unique<RWMol>();
auto qA = std::make_unique<QueryAtom>();
auto *rsq = new RecursiveStructureQuery(q1.release());
qA->setQuery(rsq);
q2->addAtom(qA.release(), true, true);
q2->addAtom(new QueryAtom(6), true, true);
q2->addBond(0, 1, Bond::UNSPECIFIED);
MatchVectType matchV;
bool found = SubstructMatch(*m, *q2, matchV);
REQUIRE(found);
REQUIRE(matchV.size() == 2);
std::vector<MatchVectType> matches;
auto n = SubstructMatch(*m, *q2, matches, true);
REQUIRE(n == 2);
REQUIRE(matches[0].size() == 2);
}
TEST_CASE("test6", "[substruct][Issue71]") {
auto a6 = std::make_unique<Atom>(6);
auto m = std::make_unique<RWMol>();
m->addAtom(a6.get());
m->addAtom(a6.get());
m->addAtom(a6.get());
m->addBond(0, 1, Bond::SINGLE);
m->addBond(1, 2, Bond::SINGLE);
m->addBond(0, 2, Bond::SINGLE);
auto q1 = std::make_unique<RWMol>();
bool updateLabel = true;
bool takeOwnership = true;
q1->addAtom(new QueryAtom(6), updateLabel, takeOwnership);
q1->addAtom(new QueryAtom(6), updateLabel, takeOwnership);
q1->addAtom(new QueryAtom(6), updateLabel, takeOwnership);
q1->addBond(0, 1, Bond::UNSPECIFIED);
q1->addBond(1, 2, Bond::UNSPECIFIED);
MatchVectType matchV;
bool found = SubstructMatch(*m, *q1, matchV);
REQUIRE(found);
REQUIRE(matchV.size() == 3);
std::vector<MatchVectType> matches;
auto n = SubstructMatch(*m, *q1, matches, true);
REQUIRE(n == 1);
REQUIRE(matches[0].size() == 3);
// close the loop and try again (we should still match)
q1->addBond(0, 2, Bond::UNSPECIFIED);
found = SubstructMatch(*m, *q1, matchV);
REQUIRE(found);
REQUIRE(matchV.size() == 3);
n = SubstructMatch(*m, *q1, matches, true);
REQUIRE(n == 1);
REQUIRE(matches[0].size() == 3);
}
TEST_CASE("test7", "[substruct][leak]") {
auto a6 = std::make_unique<Atom>(6);
std::unique_ptr<RWMol> m = std::make_unique<RWMol>();
m->addAtom(a6.get());
m->addAtom(a6.get());
m->addAtom(a6.get());
m->addBond(0, 1, Bond::SINGLE);
m->addBond(1, 2, Bond::SINGLE);
m->addBond(0, 2, Bond::SINGLE);
std::unique_ptr<RWMol> q1 = std::make_unique<RWMol>();
bool updateLabel = true;
bool takeOwnership = true;
q1->addAtom(new QueryAtom(6), updateLabel, takeOwnership);
q1->addAtom(new QueryAtom(6), updateLabel, takeOwnership);
q1->addAtom(new QueryAtom(6), updateLabel, takeOwnership);
q1->addBond(0, 1, Bond::UNSPECIFIED);
q1->addBond(1, 2, Bond::UNSPECIFIED);
MatchVectType matchV;
bool found = SubstructMatch(*m, *q1, matchV);
REQUIRE(found);
REQUIRE(matchV.size() == 3);
std::vector<MatchVectType> matches;
for (int i = 0; i < 30000; i++) {
auto n = SubstructMatch(*m, *q1, matches, true, true);
REQUIRE(n == 1);
REQUIRE(matches[0].size() == 3);
}
}
TEST_CASE("test9", "[substruct][chiral]") {
auto a6 = std::make_unique<Atom>(6);
auto m = std::make_unique<RWMol>();
m->addAtom(a6.get());
bool updateLabel = true;
bool takeOwnership = true;
m->addAtom(new Atom(6), updateLabel, takeOwnership);
m->addAtom(new Atom(7), updateLabel, takeOwnership);
m->addAtom(new Atom(8), updateLabel, takeOwnership);
m->addAtom(new Atom(9), updateLabel, takeOwnership);
m->addBond(0, 1, Bond::SINGLE);
m->addBond(0, 2, Bond::SINGLE);
m->addBond(0, 3, Bond::SINGLE);
m->addBond(0, 4, Bond::SINGLE);
m->getAtomWithIdx(0)->setChiralTag(Atom::CHI_TETRAHEDRAL_CW);
auto q1 = std::make_unique<RWMol>();
q1->addAtom(a6.get());
q1->addAtom(new Atom(6), updateLabel, takeOwnership);
q1->addAtom(new Atom(7), updateLabel, takeOwnership);
q1->addAtom(new Atom(8), updateLabel, takeOwnership);
q1->addAtom(new Atom(9), updateLabel, takeOwnership);
q1->addBond(0, 1, Bond::SINGLE);
q1->addBond(0, 2, Bond::SINGLE);
q1->addBond(0, 3, Bond::SINGLE);
q1->addBond(0, 4, Bond::SINGLE);
q1->getAtomWithIdx(0)->setChiralTag(Atom::CHI_TETRAHEDRAL_CCW);
MolOps::sanitizeMol(*m);
MolOps::assignStereochemistry(*m);
MolOps::sanitizeMol(*q1);
MolOps::assignStereochemistry(*q1);
// test with default options (no chirality):
MatchVectType matchV;
auto found = SubstructMatch(*m, *q1, matchV);
CHECK(found);
std::vector<MatchVectType> matches;
auto n = SubstructMatch(*m, *q1, matches, true);
CHECK(n == 1);
// test with chirality
found = SubstructMatch(*m, *q1, matchV, true, true);
CHECK(!found);
n = SubstructMatch(*m, *q1, matches, true, true, true);
CHECK(n == 0);
// self matches:
found = SubstructMatch(*m, *m, matchV, true, true);
CHECK(found);
n = SubstructMatch(*m, *m, matches, true, true, true);
CHECK(n == 1);
found = SubstructMatch(*q1, *q1, matchV, true, true);
CHECK(found);
n = SubstructMatch(*q1, *q1, matches, true, true, true);
CHECK(n == 1);
}
TEST_CASE("testRecursiveSerialNumbers", "[substruct][recursive]") {
auto a6 = std::make_unique<Atom>(6);
auto a8 = std::make_unique<Atom>(8);
auto m = std::make_unique<RWMol>();
m->addAtom(a6.get());
m->addAtom(a6.get());
m->addAtom(a8.get());
m->addAtom(a6.get());
m->addAtom(a6.get());
m->addBond(1, 0, Bond::SINGLE);
m->addBond(1, 2, Bond::SINGLE);
m->addBond(1, 3, Bond::SINGLE);
m->addBond(2, 4, Bond::SINGLE);
{
// this will be the recursive query
auto q1 = std::make_unique<RWMol>();
bool updateLabel = true;
bool takeOwnership = true;
q1->addAtom(new QueryAtom(6), updateLabel, takeOwnership);
q1->addAtom(new QueryAtom(8), updateLabel, takeOwnership);
q1->addBond(0, 1, Bond::UNSPECIFIED);
// here's the main query
auto q2 = std::make_unique<RWMol>();
auto qA = std::make_unique<QueryAtom>(6);
auto rsq = std::make_unique<RecursiveStructureQuery>(new RWMol(*q1), 1);
qA->expandQuery(rsq.release(), Queries::COMPOSITE_AND);
// std::cout << "post expand: " << qA->getQuery() << std::endl;
q2->addAtom(qA.release(), true, true);
// std::cout << "mol: " << q2->getAtomWithIdx(0)->getQuery() << std::endl;
q2->addAtom(new QueryAtom(8), true, true);
q2->addBond(0, 1, Bond::UNSPECIFIED);
qA.reset(new QueryAtom(6));
rsq.reset(new RecursiveStructureQuery(new RWMol(*q1), 1));
qA->expandQuery(rsq.release(), Queries::COMPOSITE_AND);
q2->addAtom(qA.release(), true, true);
q2->addBond(1, 2, Bond::UNSPECIFIED);
MatchVectType matchV;
bool found = SubstructMatch(*m, *q2, matchV);
REQUIRE(found);
REQUIRE(matchV.size() == 3);
std::vector<MatchVectType> matches;
auto n = SubstructMatch(*m, *q2, matches, true);
CHECK(n == 1);
CHECK(matches.size() == 1);
CHECK(matches[0].size() == 3);
}
}
#ifdef RDK_TEST_MULTITHREADED
#include <RDGeneral/BoostStartInclude.h>
#include <thread>
#include <future>
#include <boost/dynamic_bitset.hpp>
#include <RDGeneral/BoostEndInclude.h>
namespace {
void runblock(const std::vector<std::unique_ptr<ROMol>> *mols,
const ROMol *query, const boost::dynamic_bitset<> &hits,
unsigned int count, unsigned int idx) {
for (unsigned int j = 0; j < 100; j++) {
for (unsigned int i = 0; i < mols->size(); ++i) {
if (i % count != idx) {
continue;
}
const auto *mol = mols->at(i).get();
MatchVectType matchV;
bool found = SubstructMatch(*mol, *query, matchV);
CHECK(found == hits[i]);
}
}
}
} // namespace
TEST_CASE("testMultiThread", "[substruct][multithread]") {
std::string fName = getenv("RDBASE");
fName += "/Data/NCI/first_200.props.sdf";
SDMolSupplier suppl(fName);
std::vector<std::unique_ptr<ROMol>> mols;
while (!suppl.atEnd() && mols.size() < 100) {
ROMol *mol = nullptr;
try {
mol = suppl.next();
} catch (...) {
continue;
}
if (!mol) {
continue;
}
mols.emplace_back(mol);
}
std::vector<std::future<void>> tg;
auto query = v2::SmilesParse::MolFromSmarts("[#6;$([#6]([#6])[!#6])]");
boost::dynamic_bitset<> hits(mols.size());
for (unsigned int i = 0; i < mols.size(); ++i) {
MatchVectType matchV;
hits[i] = SubstructMatch(*mols[i], *query, matchV);
}
unsigned int count = 4;
for (unsigned int i = 0; i < count; ++i) {
tg.emplace_back(std::async(std::launch::async, runblock, &mols, query.get(),
hits, count, i));
}
for (auto &fut : tg) {
fut.get();
}
tg.clear();
query.reset(SmartsToMol("[#6]([#6])[!#6]"));
for (unsigned int i = 0; i < mols.size(); ++i) {
MatchVectType matchV;
hits[i] = SubstructMatch(*mols[i], *query, matchV);
}
for (unsigned int i = 0; i < count; ++i) {
tg.emplace_back(std::async(std::launch::async, runblock, &mols, query.get(),
hits, count, i));
}
for (auto &fut : tg) {
fut.get();
}
tg.clear();
query.reset(SmartsToMol("[$([O,S]-[!$(*=O)])]"));
for (unsigned int i = 0; i < mols.size(); ++i) {
MatchVectType matchV;
hits[i] = SubstructMatch(*mols[i], *query, matchV);
}
for (unsigned int i = 0; i < count; ++i) {
tg.emplace_back(std::async(std::launch::async, runblock, &mols, query.get(),
hits, count, i));
}
for (auto &fut : tg) {
fut.get();
}
}
#else
TEST_CASE("testMultiThread", "[substruct][multithread]") {}
#endif
TEST_CASE("testChiralMatch", "[substruct][chiral]") {
{
std::string qSmi = "Cl[C@](C)(F)Br";
std::string mSmi = "Cl[C@](C)(F)Br";
auto query = v2::SmilesParse::MolFromSmiles(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true);
CHECK(matched);
}
{
std::string qSmi = "Cl[C@](C)(F)Br";
std::string mSmi = "Cl[C@@](C)(F)Br";
auto query = v2::SmilesParse::MolFromSmiles(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true);
CHECK(!matched);
}
{
std::string qSmi = "Cl[C@](C)(F)Br";
std::string mSmi = "Cl[C@@](F)(C)Br";
auto query = v2::SmilesParse::MolFromSmiles(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true);
CHECK(matched);
}
{
std::string qSmi = "Cl[C@](C)(F)Br";
std::string mSmi = "Cl[C@](F)(C)Br";
auto query = v2::SmilesParse::MolFromSmiles(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true);
CHECK(!matched);
}
{
std::string qSmi = "Cl[C@](C)(F)Br";
std::string mSmi = "Cl[C@@](Br)(C)F";
auto query = v2::SmilesParse::MolFromSmiles(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true);
CHECK(!matched);
}
{
std::string qSmi = "Cl[C@](C)(F)Br";
std::string mSmi = "Cl[C@](Br)(C)F";
auto query = v2::SmilesParse::MolFromSmiles(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true);
CHECK(matched);
}
{
std::string qSmi = "C[C@](O)(F)Br";
std::string mSmi = "O[C@](F)(Br)CC[C@](O)(F)Br";
auto query = v2::SmilesParse::MolFromSmiles(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true);
CHECK(matched);
}
{
std::string qSmi = "C[C@](O)(F)Br";
std::string mSmi = "O[C@](F)(Br)CC[C@@](O)(F)Br";
auto query = v2::SmilesParse::MolFromSmiles(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true);
CHECK(!matched);
}
{
std::string qSmi = "C[C@](O)(F)Br";
std::string mSmi = "O[C@](F)(Br)CC(C[C@](O)(F)Br)C[C@](O)(F)Br";
auto query = v2::SmilesParse::MolFromSmiles(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
std::vector<MatchVectType> matches;
int count = SubstructMatch(*mol, *query, matches, true, true, true);
CHECK(count == 2);
}
{
std::string qSmi = "C[C@](O)(F)Br";
std::string mSmi = "O[C@@](F)(Br)CC(C[C@](O)(F)Br)C[C@](O)(F)Br";
auto query = v2::SmilesParse::MolFromSmiles(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
std::vector<MatchVectType> matches;
int count = SubstructMatch(*mol, *query, matches, true, true, true);
CHECK(count == 3);
}
{
std::string qSmi = "C[C@](O)(F)Br";
std::string mSmi = "O[C@](F)(Br)CC(C[C@@](O)(F)Br)C[C@](O)(F)Br";
auto query = v2::SmilesParse::MolFromSmiles(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
std::vector<MatchVectType> matches;
int count = SubstructMatch(*mol, *query, matches, true, true, true);
CHECK(count == 1);
}
{
std::string qSmi = "C[C@](O)(F)Br";
std::string mSmi = "O[C@](F)(Br)CC(C[C@@](O)(F)Br)C[C@@](O)(F)Br";
auto query = v2::SmilesParse::MolFromSmiles(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
std::vector<MatchVectType> matches;
// std::cerr<<"\n\n------------------------------------------\n"<<qSmi<<"
// "<<mSmi<<"\n"<<std::endl;
int count = SubstructMatch(*mol, *query, matches, true, true, true);
// std::cerr<<"res: "<<count<<std::endl;
CHECK(count == 0);
}
{
std::string qSmi = "Cl[C@](*)(F)Br";
std::string mSmi = "Cl[C@](C)(F)Br";
auto query = v2::SmilesParse::MolFromSmarts(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true);
CHECK(matched);
}
{
std::string qSmi = "Cl[C@@](*)(F)Br";
std::string mSmi = "Cl[C@](C)(F)Br";
auto query = v2::SmilesParse::MolFromSmarts(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true);
CHECK(!matched);
}
{
std::string qSmi = "Cl[C@](*)(*)Br";
std::string mSmi = "Cl[C@](C)(F)Br";
auto query = v2::SmilesParse::MolFromSmarts(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true);
CHECK(matched);
}
{
std::string qSmi = "Cl[C@@](*)(*)Br";
std::string mSmi = "Cl[C@](C)(F)Br";
auto query = v2::SmilesParse::MolFromSmarts(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true);
CHECK(matched);
}
{
std::string qSmi = "[C@](C)(F)Br";
std::string mSmi = "Cl[C@](C)(F)Br";
auto query = v2::SmilesParse::MolFromSmarts(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true);
CHECK(!matched);
}
{
std::string qSmi = "[C@@](C)(F)Br";
std::string mSmi = "Cl[C@](C)(F)Br";
auto query = v2::SmilesParse::MolFromSmarts(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true);
CHECK(matched);
}
{
std::string qSmi = "C[C@](F)Br";
std::string mSmi = "Cl[C@](C)(F)Br";
auto query = v2::SmilesParse::MolFromSmarts(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true);
CHECK(!matched);
}
{
std::string qSmi = "C[C@@](F)Br";
std::string mSmi = "Cl[C@](C)(F)Br";
auto query = v2::SmilesParse::MolFromSmarts(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true);
CHECK(matched);
}
{
std::string qSmi = "Cl[C@](F)Br";
std::string mSmi = "Cl[C@](C)(F)Br";
auto query = v2::SmilesParse::MolFromSmarts(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true);
CHECK(matched);
}
{
std::string qSmi = "Cl[C@](C)F";
std::string mSmi = "Cl[C@](C)(F)Br";
auto query = v2::SmilesParse::MolFromSmarts(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true);
CHECK(matched);
}
}
TEST_CASE("testCisTransMatch", "[substruct][stereochemistry]") {
{
std::string qSmi = "CC=CC";
std::string mSmi = "CC=CC";
auto query = v2::SmilesParse::MolFromSmarts(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true);
CHECK(matched);
}
{
std::string qSmi = "CC=CC";
std::string mSmi = "C/C=C/C";
auto query = v2::SmilesParse::MolFromSmarts(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true);
CHECK(matched);
}
{
std::string qSmi = "C/C=C/C";
std::string mSmi = "CC=CC";
auto query = v2::SmilesParse::MolFromSmarts(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true);
CHECK(!matched);
}
{
std::string qSmi = "C/C=C/C";
std::string mSmi = "C/C=C\\C";
auto query = v2::SmilesParse::MolFromSmarts(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true);
CHECK(!matched);
}
{
std::string qSmi = "C/C=C/C";
std::string mSmi = "C/C=C/C";
auto query = v2::SmilesParse::MolFromSmarts(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true);
CHECK(matched);
}
{
std::string qSmi = "C/C=C/C";
std::string mSmi = "C/C=C(/F)C";
auto query = v2::SmilesParse::MolFromSmarts(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true);
CHECK(!matched);
}
{
std::string qSmi = "C/C=C/C";
std::string mSmi = "C/C=C(\\F)C";
auto query = v2::SmilesParse::MolFromSmarts(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true);
CHECK(matched);
}
{
std::string qSmi = "C/C=C/C";
std::string mSmi = "C/C(F)=C(\\F)C";
auto query = v2::SmilesParse::MolFromSmarts(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true);
CHECK(matched);
}
{
std::string qSmi = "C/C=C/C";
std::string mSmi = "CC(/F)=C(\\F)C";
auto query = v2::SmilesParse::MolFromSmarts(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true);
CHECK(matched);
}
{
std::string qSmi = "C/C=C/C";
std::string mSmi = "CC(\\F)=C(\\F)C";
auto query = v2::SmilesParse::MolFromSmarts(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true);
CHECK(!matched);
}
}
TEST_CASE("testCisTransMatch2", "[substruct][stereochemistry]") {
{
std::string qSmi = "CC=CC";
std::string mSmi = "CCC=CC";
auto query = v2::SmilesParse::MolFromSmarts(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
CHECK(SubstructMatch(*mol, *query, matchV, true, true));
CHECK(query->getBondWithIdx(1)->getBondType() == Bond::DOUBLE);
query->getBondWithIdx(1)->getStereoAtoms().push_back(0);
query->getBondWithIdx(1)->getStereoAtoms().push_back(3);
query->getBondWithIdx(1)->setStereo(Bond::STEREOCIS);
CHECK(!SubstructMatch(*mol, *query, matchV, true, true));
CHECK(SubstructMatch(*mol, *query, matchV, true, false));
CHECK(mol->getBondWithIdx(2)->getBondType() == Bond::DOUBLE);
mol->getBondWithIdx(2)->getStereoAtoms().push_back(1);
mol->getBondWithIdx(2)->getStereoAtoms().push_back(4);
mol->getBondWithIdx(2)->setStereo(Bond::STEREOCIS);
CHECK(SubstructMatch(*mol, *query, matchV, true, true));
mol->getBondWithIdx(2)->setStereo(Bond::STEREOTRANS);
CHECK(!SubstructMatch(*mol, *query, matchV, true, true));
CHECK(SubstructMatch(*mol, *query, matchV, true, false));
query->getBondWithIdx(1)->setStereo(Bond::STEREONONE);
CHECK(SubstructMatch(*mol, *query, matchV, true, true));
query->getBondWithIdx(1)->setStereo(Bond::STEREOANY);
CHECK(SubstructMatch(*mol, *query, matchV, true, true));
}
{
std::string qSmi = "CC=CC";
std::string mSmi = "CC=C(C)F";
auto query = v2::SmilesParse::MolFromSmarts(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
CHECK(SubstructMatch(*mol, *query, matchV, true, true));
CHECK(query->getBondWithIdx(1)->getBondType() == Bond::DOUBLE);
query->getBondWithIdx(1)->getStereoAtoms().push_back(0);
query->getBondWithIdx(1)->getStereoAtoms().push_back(3);
query->getBondWithIdx(1)->setStereo(Bond::STEREOCIS);
CHECK(!SubstructMatch(*mol, *query, matchV, true, true));
CHECK(SubstructMatch(*mol, *query, matchV, true, false));
CHECK(mol->getBondWithIdx(1)->getBondType() == Bond::DOUBLE);
mol->getBondWithIdx(1)->getStereoAtoms().push_back(0);
mol->getBondWithIdx(1)->getStereoAtoms().push_back(3);
mol->getBondWithIdx(1)->setStereo(Bond::STEREOCIS);
CHECK(SubstructMatch(*mol, *query, matchV, true, true));
mol->getBondWithIdx(1)->setStereo(Bond::STEREOTRANS);
CHECK(!SubstructMatch(*mol, *query, matchV, true, true));
CHECK(SubstructMatch(*mol, *query, matchV, true, false));
query->getBondWithIdx(1)->setStereo(Bond::STEREONONE);
CHECK(SubstructMatch(*mol, *query, matchV, true, true));
query->getBondWithIdx(1)->setStereo(Bond::STEREOANY);
CHECK(SubstructMatch(*mol, *query, matchV, true, true));
}
{
std::string qSmi = "CC=CC";
std::string mSmi = "CCC=C(C)F";
auto query = v2::SmilesParse::MolFromSmarts(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
CHECK(SubstructMatch(*mol, *query, matchV, true, true));
CHECK(query->getBondWithIdx(1)->getBondType() == Bond::DOUBLE);
query->getBondWithIdx(1)->getStereoAtoms().push_back(0);
query->getBondWithIdx(1)->getStereoAtoms().push_back(3);
query->getBondWithIdx(1)->setStereo(Bond::STEREOCIS);
CHECK(!SubstructMatch(*mol, *query, matchV, true, true));
CHECK(SubstructMatch(*mol, *query, matchV, true, false));
CHECK(mol->getBondWithIdx(2)->getBondType() == Bond::DOUBLE);
mol->getBondWithIdx(2)->getStereoAtoms().push_back(1);
mol->getBondWithIdx(2)->getStereoAtoms().push_back(4);
mol->getBondWithIdx(2)->setStereo(Bond::STEREOCIS);
CHECK(SubstructMatch(*mol, *query, matchV, true, true));
mol->getBondWithIdx(2)->setStereo(Bond::STEREOTRANS);
CHECK(!SubstructMatch(*mol, *query, matchV, true, true));
CHECK(SubstructMatch(*mol, *query, matchV, true, false));
query->getBondWithIdx(1)->setStereo(Bond::STEREONONE);
CHECK(SubstructMatch(*mol, *query, matchV, true, true));
query->getBondWithIdx(1)->setStereo(Bond::STEREOANY);
CHECK(SubstructMatch(*mol, *query, matchV, true, true));
}
{ // now make it harder: the stereoatoms don't match, but the stereochemistry
// does
std::string qSmi = "CC=CC";
std::string mSmi = "CCC=C(C)F";
auto query = v2::SmilesParse::MolFromSmarts(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
CHECK(SubstructMatch(*mol, *query, matchV, true, true));
CHECK(query->getBondWithIdx(1)->getBondType() == Bond::DOUBLE);
query->getBondWithIdx(1)->getStereoAtoms().push_back(0);
query->getBondWithIdx(1)->getStereoAtoms().push_back(3);
query->getBondWithIdx(1)->setStereo(Bond::STEREOCIS);
CHECK(!SubstructMatch(*mol, *query, matchV, true, true));
CHECK(SubstructMatch(*mol, *query, matchV, true, false));
CHECK(mol->getBondWithIdx(2)->getBondType() == Bond::DOUBLE);
mol->getBondWithIdx(2)->getStereoAtoms().push_back(1);
mol->getBondWithIdx(2)->getStereoAtoms().push_back(5);
mol->getBondWithIdx(2)->setStereo(Bond::STEREOTRANS);
CHECK(SubstructMatch(*mol, *query, matchV, true, true));
mol->getBondWithIdx(2)->setStereo(Bond::STEREOCIS);
CHECK(!SubstructMatch(*mol, *query, matchV, true, true));
CHECK(SubstructMatch(*mol, *query, matchV, true, false));
query->getBondWithIdx(1)->setStereo(Bond::STEREONONE);
CHECK(SubstructMatch(*mol, *query, matchV, true, true));
query->getBondWithIdx(1)->setStereo(Bond::STEREOANY);
CHECK(SubstructMatch(*mol, *query, matchV, true, true));
}
{ // now make it harder: the stereoatoms don't match on either end, but the
// stereochemistry does
std::string qSmi = "CC=CC";
std::string mSmi = "CCC(F)=C(C)F";
auto query = v2::SmilesParse::MolFromSmarts(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
CHECK(SubstructMatch(*mol, *query, matchV, true, true));
CHECK(query->getBondWithIdx(1)->getBondType() == Bond::DOUBLE);
query->getBondWithIdx(1)->getStereoAtoms().push_back(0);
query->getBondWithIdx(1)->getStereoAtoms().push_back(3);
query->getBondWithIdx(1)->setStereo(Bond::STEREOCIS);
CHECK(!SubstructMatch(*mol, *query, matchV, true, true));
CHECK(SubstructMatch(*mol, *query, matchV, true, false));
CHECK(mol->getBondWithIdx(3)->getBondType() == Bond::DOUBLE);
mol->getBondWithIdx(3)->getStereoAtoms().push_back(3);
mol->getBondWithIdx(3)->getStereoAtoms().push_back(6);
mol->getBondWithIdx(3)->setStereo(Bond::STEREOCIS);
CHECK(SubstructMatch(*mol, *query, matchV, true, true));
mol->getBondWithIdx(3)->setStereo(Bond::STEREOTRANS);
CHECK(!SubstructMatch(*mol, *query, matchV, true, true));
CHECK(SubstructMatch(*mol, *query, matchV, true, false));
query->getBondWithIdx(1)->setStereo(Bond::STEREONONE);
CHECK(SubstructMatch(*mol, *query, matchV, true, true));
query->getBondWithIdx(1)->setStereo(Bond::STEREOANY);
CHECK(SubstructMatch(*mol, *query, matchV, true, true));
}
}
TEST_CASE("testGitHubIssue15", "[substruct][github][issue15]") {
{
std::string qSmi = "[R2]~[R1]~[R2]";
std::string mSmi = "CCC";
auto query = v2::SmilesParse::MolFromSmarts(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true, true);
CHECK(!matched);
}
{
std::string qSmi = "[R2]~[R1]~[R2]";
std::string mSmi = "CCC";
auto query = v2::SmilesParse::MolFromSmarts(qSmi);
auto mol = v2::SmilesParse::MolFromSmiles(mSmi);
MatchVectType matchV;
bool matched = SubstructMatch(*mol, *query, matchV, true, true, true);
CHECK(!matched);
}
}
TEST_CASE("testGitHubIssue409", "[substruct][github][issue409]") {
{
std::string smi = "FC(F)(F)CC(F)(F)F";
auto mol = v2::SmilesParse::MolFromSmiles(smi);
std::vector<MatchVectType> matches;
unsigned int matched =
SubstructMatch(*mol, *mol, matches, false, true, false, false);
CHECK(matched == matches.size());
CHECK(matches.size() == 72);
matched =
SubstructMatch(*mol, *mol, matches, false, true, false, false, 16);
CHECK(matches.size() == 16);
}
}
TEST_CASE("testGitHubIssue688", "[substruct][github][issue688]") {
{
std::string smi = "C1CC[C@](Cl)(N)O1";
auto mol = v2::SmilesParse::MolFromSmiles(smi);
CHECK(mol);
// mol->debugMol(std::cerr);
std::string sma = "C1CC[C@](N)O1";
auto qmol = v2::SmilesParse::MolFromSmarts(sma);
CHECK(qmol);
// qmol->updatePropertyCache();
// qmol->debugMol(std::cerr);
MatchVectType match;
CHECK(SubstructMatch(*mol, *qmol, match, true, false));
CHECK(match.size() == qmol->getNumAtoms());
CHECK(SubstructMatch(*mol, *qmol, match, true, true));
CHECK(match.size() == qmol->getNumAtoms());
}
{
std::string smi = "C1CC[C@](Cl)(N)O1";
auto mol = v2::SmilesParse::MolFromSmiles(smi);
CHECK(mol);
// mol->debugMol(std::cerr);
std::string sma = "C1CC[C@@](N)O1";
auto qmol = v2::SmilesParse::MolFromSmarts(sma);
CHECK(qmol);
// qmol->updatePropertyCache();
// qmol->debugMol(std::cerr);
MatchVectType match;
CHECK(SubstructMatch(*mol, *qmol, match, true, false));
CHECK(match.size() == qmol->getNumAtoms());
CHECK(!SubstructMatch(*mol, *qmol, match, true, true));
}
{
std::string smi = "N[C@]1(Cl)CCCO1";
auto mol = v2::SmilesParse::MolFromSmiles(smi);
CHECK(mol);
// mol->debugMol(std::cerr);
std::string sma = "N[C@]1CCCO1";
auto qmol = v2::SmilesParse::MolFromSmarts(sma);
CHECK(qmol);
// qmol->updatePropertyCache();
// qmol->debugMol(std::cerr);
// std::cerr << MolToSmiles(*qmol, true) << std::endl;
MatchVectType match;
CHECK(SubstructMatch(*mol, *qmol, match, true, false));
CHECK(match.size() == qmol->getNumAtoms());
CHECK(SubstructMatch(*mol, *qmol, match, true, true));
CHECK(match.size() == qmol->getNumAtoms());
}
{
std::string smi = "N[C@]1(Cl)CCCO1";
auto mol = v2::SmilesParse::MolFromSmiles(smi);
CHECK(mol);
// mol->debugMol(std::cerr);
std::string sma = "N[C@@]1CCCO1";
auto qmol = v2::SmilesParse::MolFromSmarts(sma);
CHECK(qmol);
// qmol->updatePropertyCache();
// qmol->debugMol(std::cerr);
// std::cerr << MolToSmiles(*qmol, true) << std::endl;
MatchVectType match;
CHECK(SubstructMatch(*mol, *qmol, match, true, false));
CHECK(match.size() == qmol->getNumAtoms());
CHECK(!SubstructMatch(*mol, *qmol, match, true, true));
}
}
TEST_CASE("testDativeMatch", "[substruct][dative]") {
{
std::string smi = "[Cu]->[Fe]";
auto mol = v2::SmilesParse::MolFromSmiles(smi);
CHECK(mol);
// make sure a self-match works
MatchVectType match;
CHECK(SubstructMatch(*mol, *mol, match));
CHECK(match.size() == mol->getNumAtoms());
{ // reverse the order and make sure that works
std::string sma = "[Fe]<-[Cu]";
auto qmol = v2::SmilesParse::MolFromSmarts(sma);
CHECK(qmol);
MatchVectType match;
CHECK(SubstructMatch(*mol, *qmol, match));
CHECK(match.size() == qmol->getNumAtoms());
}
{ // reverse the direction and make sure that does not work.
std::string sma = "[Fe]->[Cu]";
auto qmol = v2::SmilesParse::MolFromSmarts(sma);
CHECK(qmol);
MatchVectType match;
CHECK(!SubstructMatch(*mol, *qmol, match));
}
}
}
TEST_CASE("testGithubIssue1489", "[substruct][github][issue1489]") {
{
std::string smi1 = "CCC[C@@H]1CN(CCC)CCN1";
std::string smi2 = "CCC[C@H]1CN(CCC)CCN1";
auto mol1 = v2::SmilesParse::MolFromSmiles(smi1);
CHECK(mol1);
auto mol2 = v2::SmilesParse::MolFromSmiles(smi2);
CHECK(mol2);
bool recursionPossible = true;
bool useChirality = true;
MatchVectType match;
// make sure self-matches work
CHECK(SubstructMatch(*mol1, *mol1, match, recursionPossible, useChirality));
CHECK(SubstructMatch(*mol2, *mol2, match, recursionPossible, useChirality));
// check matches using the molecules from smiles:
CHECK(
!SubstructMatch(*mol1, *mol2, match, recursionPossible, useChirality));
CHECK(SubstructMatch(*mol1, *mol2, match, recursionPossible, false));
{
auto qmol1 = v2::SmilesParse::MolFromSmarts(smi1);
qmol1->updatePropertyCache();
CHECK(qmol1);
CHECK(SubstructMatch(*mol1, *qmol1, match, recursionPossible, false));
CHECK(SubstructMatch(*mol1, *qmol1, match, recursionPossible,
useChirality));
CHECK(SubstructMatch(*mol2, *qmol1, match, recursionPossible, false));
CHECK(!SubstructMatch(*mol2, *qmol1, match, recursionPossible,
useChirality));
}
}
{
std::string smi1 = "F([C@@H](Cl)Br)";
auto mol1 = v2::SmilesParse::MolFromSmiles(smi1);
CHECK(mol1);
bool recursionPossible = true;
bool useChirality = true;
MatchVectType match;
// make sure self-matches work
CHECK(SubstructMatch(*mol1, *mol1, match, recursionPossible, useChirality));
{
auto qmol1 = v2::SmilesParse::MolFromSmarts(smi1);
qmol1->updatePropertyCache();
CHECK(qmol1);
CHECK(SubstructMatch(*mol1, *qmol1, match, recursionPossible, false));
CHECK(SubstructMatch(*mol1, *qmol1, match, recursionPossible,
useChirality));
}
{
std::string smi2 = "F([C@H](Br)Cl)";
auto qmol1 = v2::SmilesParse::MolFromSmarts(smi2);
qmol1->updatePropertyCache();
CHECK(qmol1);
CHECK(SubstructMatch(*mol1, *qmol1, match, recursionPossible, false));
CHECK(SubstructMatch(*mol1, *qmol1, match, recursionPossible,
useChirality));
}
}
}
TEST_CASE("testGithub2570", "[substruct][github][issue2570]") {
bool uniquify = true;
bool recursionPossible = true;
bool useChirality = true;
{
const auto mol = R"(C[C@](Cl)(Br)F)"_smiles;
{
const auto query = R"([C@](Cl)(Br)F)"_smarts;
std::vector<MatchVectType> matches;
CHECK(!SubstructMatch(*mol, *query, matches, uniquify, recursionPossible,
useChirality));
}
{
const auto query = R"([C@@](Cl)(Br)F)"_smarts;
std::vector<MatchVectType> matches;
CHECK(SubstructMatch(*mol, *query, matches, uniquify, recursionPossible,
useChirality));
}
{ // Swap order of a pair of atoms
const auto query = R"([C@](Br)(Cl)F)"_smarts;
std::vector<MatchVectType> matches;
CHECK(SubstructMatch(*mol, *query, matches, uniquify, recursionPossible,
useChirality));
}
{
const auto query = R"([C@@](Br)(Cl)F)"_smarts;
std::vector<MatchVectType> matches;
CHECK(!SubstructMatch(*mol, *query, matches, uniquify, recursionPossible,
useChirality));
}
{ // Smaller fragments should always match as long as they have have a
// chiral tag,
// as these don't have enough neighbors to define CW/CCW chirality
const std::vector<std::string> smarts({"[C@](Cl)Br", "[C@@](Cl)Br",
"[C@](Br)F", "[C@@](Br)F", "[C@]F",
"[C@@]F", "[C@]", "[C@@]"});
std::vector<MatchVectType> matches;
for (const auto &sma : smarts) {
std::unique_ptr<ROMol> query(SmartsToMol(sma));
CHECK(SubstructMatch(*mol, *query, matches, uniquify, recursionPossible,
useChirality));
}
}
}
{ // Mol also starting with the chiral atom
const auto mol = R"([C@](C)(Cl)(Br)F)"_smiles;
{
const auto query = R"([C@](C)(Cl)(Br)F)"_smarts;
std::vector<MatchVectType> matches;
CHECK(SubstructMatch(*mol, *query, matches, uniquify, recursionPossible,
useChirality));
}
{
const auto query = R"([C@@](C)(Cl)(Br)F)"_smarts;
std::vector<MatchVectType> matches;
CHECK(!SubstructMatch(*mol, *query, matches, uniquify, recursionPossible,
useChirality));
}
{
const auto query = R"([C@](C)(Cl)Br)"_smarts;
std::vector<MatchVectType> matches;
CHECK(SubstructMatch(*mol, *query, matches, uniquify, recursionPossible,
useChirality));
}
{
const auto query = R"([C@@](C)(Cl)Br)"_smarts;
std::vector<MatchVectType> matches;
CHECK(!SubstructMatch(*mol, *query, matches, uniquify, recursionPossible,
useChirality));
}
{
const auto query = R"([C@](Cl)(Br)F)"_smarts;
std::vector<MatchVectType> matches;
CHECK(!SubstructMatch(*mol, *query, matches, uniquify, recursionPossible,
useChirality));
}
{
const auto query = R"([C@@](Cl)(Br)F)"_smarts;
std::vector<MatchVectType> matches;
CHECK(SubstructMatch(*mol, *query, matches, uniquify, recursionPossible,
useChirality));
}
{ // Swap order of a pair of atoms
const auto query = R"([C@](Br)(Cl)F)"_smarts;
std::vector<MatchVectType> matches;
CHECK(SubstructMatch(*mol, *query, matches, uniquify, recursionPossible,
useChirality));
}
{
const auto query = R"([C@@](Br)(Cl)F)"_smarts;
std::vector<MatchVectType> matches;
CHECK(!SubstructMatch(*mol, *query, matches, uniquify, recursionPossible,
useChirality));
}
}
{ // Start from a physical H atom
const auto mol = R"([H][C@](O)(F)Cl)"_smiles;
const auto smarts = MolToSmarts(*mol);
std::unique_ptr<ROMol> query(SmartsToMol(smarts));
CHECK(smarts == R"([#8]-[#6@@H](-[#9])-[#17])");
std::vector<MatchVectType> matches;
CHECK(SubstructMatch(*mol, *query, matches, uniquify, recursionPossible,
useChirality));
}
{
const auto mol = R"([H][C@](O)(F)Cl)"_smiles;
const auto query = R"([C@H](O)(F)Cl)"_smarts;
std::vector<MatchVectType> matches;
CHECK(SubstructMatch(*mol, *query, matches, uniquify, recursionPossible,
useChirality));
}
{
const auto mol = R"([H][C@](O)(F)Cl)"_smiles;
const auto query = R"([C@@H](O)(F)Cl)"_smarts;
std::vector<MatchVectType> matches;
CHECK(!SubstructMatch(*mol, *query, matches, uniquify, recursionPossible,
useChirality));
}
{ // Start from an attached H atom
const auto mol = R"([C@H](O)(F)Cl)"_smiles;
const auto smarts = MolToSmarts(*mol);
CHECK(smarts == R"([#8]-[#6@@H](-[#9])-[#17])");
const auto query = v2::SmilesParse::MolFromSmarts(smarts);
std::vector<MatchVectType> matches;
CHECK(SubstructMatch(*mol, *query, matches, uniquify, recursionPossible,
useChirality));
}
{
const auto mol = R"([C@H](O)(F)Cl)"_smiles;
const auto query = R"([C@H](O)(F)Cl)"_smarts;
std::vector<MatchVectType> matches;
CHECK(SubstructMatch(*mol, *query, matches, uniquify, recursionPossible,
useChirality));
}
{
const auto mol = R"([C@H](O)(F)Cl)"_smiles;
const auto query = R"([C@@H](O)(F)Cl)"_smarts;
std::vector<MatchVectType> matches;
CHECK(!SubstructMatch(*mol, *query, matches, uniquify, recursionPossible,
useChirality));
}
{ // Without H
const auto mol = R"([C@](O)(F)(Cl)C)"_smiles;
const auto smarts = MolToSmarts(*mol);
const auto query = v2::SmilesParse::MolFromSmarts(smarts);
CHECK(smarts == R"([#8]-[#6@](-[#9])(-[#17])-[#6])");
std::vector<MatchVectType> matches;
CHECK(SubstructMatch(*mol, *query, matches, uniquify, recursionPossible,
useChirality));
}
{
const auto mol = R"([C@](O)(F)(Cl)C)"_smiles;
const auto query = R"([#6@](-[#8])(-[#9])(-[#17])-[#6])"_smarts;
std::vector<MatchVectType> matches;
CHECK(SubstructMatch(*mol, *query, matches, uniquify, recursionPossible,
useChirality));
}
{ // What about queries not coming from SMARTS?
const std::vector<std::string> smiles( // These are all equivalent
{"N[C@@]([H])(C)C(=O)O", "N[C@@H](C)C(=O)O", "N[C@H](C(=O)O)C",
"[H][C@](N)(C)C(=O)O", "[C@H](N)(C)C(=O)O"});
for (const auto &smi1 : smiles) {
const auto mol1 = std::unique_ptr<ROMol>(SmilesToMol(smi1));
for (const auto &smi2 : smiles) { // Test them in both directions
const auto mol2 = std::unique_ptr<ROMol>(SmilesToMol(smi2));
std::vector<MatchVectType> matches;
CHECK(SubstructMatch(*mol1, *mol2, matches, uniquify, recursionPossible,
useChirality));
};
}
}
}
TEST_CASE("testEZVsCisTransMatch", "[substruct][stereochemistry]") {
UseLegacyStereoPerceptionFixture fx(true);
const auto mol = R"(F/C(C)=C(C)/Cl)"_smiles;
{
const Bond *stereoBnd = mol->getBondWithIdx(2);
CHECK(stereoBnd->getStereo() == Bond::STEREOE);
}
// pairs of {query, matching expectation}
const std::vector<std::pair<std::string, bool>> checks({
{R"(F/C(C)=C(C)/Cl)", true}, // identical
{R"(F\C(C)=C(C)\Cl)", true}, // symmetric
{R"(F/C(C)=C(C)\Cl)", false}, // opposite
{R"(F\C(C)=C(C)/Cl)", false} // symmetric opposite
});
// Test with same stereoatoms as mol
for (const auto &check : checks) {
auto query = v2::SmilesParse::MolFromSmiles(check.first);
{
Bond *stereoBnd = query->getBondWithIdx(2);
auto stereo = stereoBnd->getStereo();
CHECK((stereo == Bond::STEREOE || stereo == Bond::STEREOZ));
stereoBnd->setStereoAtoms(0, 5); // Same as mol
stereo = Chirality::translateEZLabelToCisTrans(stereo);
CHECK((stereo == Bond::STEREOCIS || stereo == Bond::STEREOTRANS));
stereoBnd->setStereo(stereo);
}
MatchVectType match;
bool recursionPossible = true;
bool useChirality = true;
CHECK(check.second ==
SubstructMatch(*mol, *query, match, recursionPossible, useChirality));
}
// Symmetrize stereoatoms
for (const auto &check : checks) {
auto query = v2::SmilesParse::MolFromSmiles(check.first);
{
Bond *stereoBnd = query->getBondWithIdx(2);
auto stereo = stereoBnd->getStereo();
CHECK((stereo == Bond::STEREOE || stereo == Bond::STEREOZ));
stereoBnd->setStereoAtoms(2, 4); // symmetric to mol
stereo = Chirality::translateEZLabelToCisTrans(stereo);
CHECK((stereo == Bond::STEREOCIS || stereo == Bond::STEREOTRANS));
stereoBnd->setStereo(stereo);
}
MatchVectType match;
bool recursionPossible = true;
bool useChirality = true;
CHECK(check.second ==
SubstructMatch(*mol, *query, match, recursionPossible, useChirality));
}
// Flip one stereoatom and the label
for (const auto &check : checks) {
auto query = v2::SmilesParse::MolFromSmiles(check.first);
{
Bond *stereoBnd = query->getBondWithIdx(2);
auto stereo = stereoBnd->getStereo();
CHECK((stereo == Bond::STEREOE || stereo == Bond::STEREOZ));
stereoBnd->setStereoAtoms(0, 4); // Reverse second stereoatom
if (stereo == Bond::STEREOE) {
stereo = Bond::STEREOCIS;
} else {
stereo = Bond::STEREOTRANS;
}
stereoBnd->setStereo(stereo);
}
MatchVectType match;
bool recursionPossible = true;
bool useChirality = true;
CHECK(check.second ==
SubstructMatch(*mol, *query, match, recursionPossible, useChirality));
}
}
TEST_CASE("testMostSubstitutedCoreMatch", "[substruct][core]") {
auto core = "[*:1]c1cc([*:2])ccc1[*:3]"_smarts;
auto orthoMeta = "c1ccc(-c2ccc(-c3ccccc3)c(-c3ccccc3)c2)cc1"_smiles;
auto ortho = "c1ccc(-c2ccccc2-c2ccccc2)cc1"_smiles;
auto meta = "c1ccc(-c2cccc(-c3ccccc3)c2)cc1"_smiles;
auto biphenyl = "c1ccccc1-c1ccccc1"_smiles;
auto phenyl = "c1ccccc1"_smiles;
struct numHsMatchingDummies {
static unsigned int get(const ROMol &mol, const ROMol &core,
const MatchVectType &match) {
return std::count_if(
match.begin(), match.end(),
[&mol, &core](const std::pair<int, int> &pair) {
return (core.getAtomWithIdx(pair.first)->getAtomicNum() == 0 &&
mol.getAtomWithIdx(pair.second)->getAtomicNum() == 1);
});
}
};
const auto &coreRef = *core;
for (auto &molResPair : std::vector<std::pair<RDKit::RWMol *, unsigned>>{
std::make_pair(orthoMeta.get(), 0u), std::make_pair(ortho.get(), 1u),
std::make_pair(meta.get(), 1u), std::make_pair(biphenyl.get(), 2u),
std::make_pair(phenyl.get(), 3u)}) {
auto &mol = *molResPair.first;
const auto res = molResPair.second;
MolOps::addHs(mol);
auto matches = SubstructMatch(mol, coreRef);
auto bestMatch = getMostSubstitutedCoreMatch(mol, coreRef, matches);
CHECK(numHsMatchingDummies::get(mol, coreRef, bestMatch) == res);
std::vector<unsigned int> ctrlCounts(matches.size());
std::transform(matches.begin(), matches.end(), ctrlCounts.begin(),
[&mol, &coreRef](const MatchVectType &match) {
return numHsMatchingDummies::get(mol, coreRef, match);
});
std::sort(ctrlCounts.begin(), ctrlCounts.end());
std::vector<unsigned int> sortedCounts(matches.size());
auto sortedMatches =
sortMatchesByDegreeOfCoreSubstitution(mol, coreRef, matches);
std::transform(sortedMatches.begin(), sortedMatches.end(),
sortedCounts.begin(),
[&mol, &coreRef](const MatchVectType &match) {
return numHsMatchingDummies::get(mol, coreRef, match);
});
CHECK(ctrlCounts == sortedCounts);
}
std::vector<MatchVectType> emptyMatches;
bool raised = false;
try {
getMostSubstitutedCoreMatch(*orthoMeta, coreRef, emptyMatches);
} catch (const Invar::Invariant &) {
raised = true;
}
CHECK(raised);
raised = false;
try {
sortMatchesByDegreeOfCoreSubstitution(*orthoMeta, coreRef, emptyMatches);
} catch (const Invar::Invariant &) {
raised = true;
}
CHECK(raised);
}
TEST_CASE("testLongRing", "[substruct][ring]") {
std::string mol_smiles = "c12ccc(CCCCCCCc5ccc(C2)cc5)cc1";
std::string query_smiles = "c1cc2ccc1CCCCCCCc1ccc(cc1)C2";
auto mol = v2::SmilesParse::MolFromSmiles(mol_smiles);
auto query = v2::SmilesParse::MolFromSmiles(query_smiles);
CHECK(MolToSmiles(*query) == MolToSmiles(*mol));
MatchVectType match1;
MatchVectType match2;
SubstructMatchParameters params;
CHECK(SubstructMatch(*mol, *query, match1));
CHECK(SubstructMatch(*query, *mol, match2));
}
TEST_CASE("testIsAtomTerminalRGroupOrQueryHydrogen", "[substruct][rgroup]") {
{
auto mol = R"CTAB(
MJ201100
7 7 0 0 0 0 0 0 0 0999 V2000
-0.3795 1.5839 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.0939 1.1714 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.0939 0.3463 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.3795 -0.0661 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.3349 0.3463 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.3349 1.1714 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.0494 1.5839 0.0000 R# 0 0 0 0 0 0 0 0 0 0 0 0
1 2 4 0 0 0 0
2 3 4 0 0 0 0
3 4 4 0 0 0 0
4 5 4 0 0 0 0
5 6 4 0 0 0 0
6 1 4 0 0 0 0
6 7 1 0 0 0 0
M RGP 1 7 1
M END
)CTAB"_ctab;
const auto rAtom = mol->getAtomWithIdx(mol->getNumAtoms() - 1);
CHECK(isAtomTerminalRGroupOrQueryHydrogen(rAtom));
}
{
auto mol = R"CTAB(
MJ201100
6 6 0 0 0 0 0 0 0 0999 V2000
-0.7589 1.4277 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.4733 1.0152 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.4733 0.1901 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.7589 -0.2223 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.0444 0.1901 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.0444 1.0152 0.0000 R# 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
2 3 1 0 0 0 0
3 4 2 0 0 0 0
4 5 1 0 0 0 0
5 6 2 0 0 0 0
6 1 1 0 0 0 0
M RGP 1 6 1
M END
)CTAB"_ctab;
const auto rAtom = mol->getAtomWithIdx(mol->getNumAtoms() - 1);
CHECK(!isAtomTerminalRGroupOrQueryHydrogen(rAtom));
}
{
auto mol = R"CTAB(
MJ201100
7 7 0 0 0 0 0 0 0 0999 V2000
-0.9152 0.2893 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.6296 -0.1231 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.6296 -0.9482 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.9152 -1.3607 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.2007 -0.9482 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.2007 -0.1231 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.5137 0.2893 0.0000 L 0 0 0 0 0 0 0 0 0 0 0 0
1 2 4 0 0 0 0
2 3 4 0 0 0 0
3 4 4 0 0 0 0
4 5 4 0 0 0 0
5 6 4 0 0 0 0
6 1 4 0 0 0 0
6 7 1 0 0 0 0
M ALS 7 10 F H C N O F P S Cl Br I
M END
)CTAB"_ctab;
const auto rAtom = mol->getAtomWithIdx(mol->getNumAtoms() - 1);
CHECK(isAtomTerminalRGroupOrQueryHydrogen(rAtom));
}
{
auto mol = R"CTAB(
MJ201100
7 7 0 0 0 0 0 0 0 0999 V2000
-0.9152 0.2893 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.6296 -0.1231 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.6296 -0.9482 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.9152 -1.3607 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.2007 -0.9482 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.2007 -0.1231 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.5137 0.2893 0.0000 L 0 0 0 0 0 0 0 0 0 0 0 0
1 2 4 0 0 0 0
2 3 4 0 0 0 0
3 4 4 0 0 0 0
4 5 4 0 0 0 0
5 6 4 0 0 0 0
6 1 4 0 0 0 0
6 7 1 0 0 0 0
M ALS 7 9 F C N O F P S Cl Br I
M END
)CTAB"_ctab;
const auto rAtom = mol->getAtomWithIdx(mol->getNumAtoms() - 1);
CHECK(!isAtomTerminalRGroupOrQueryHydrogen(rAtom));
}
}
TEST_CASE("SubstructMatchCount regression", "[substruct]") {
{
auto mol = "c1ccccc1"_smiles;
REQUIRE(mol);
std::unique_ptr<ROMol> query{SmartsToMol("c:c")};
REQUIRE(query);
SubstructMatchParameters params;
const auto matches = SubstructMatch(*mol, *query, params);
const auto count = SubstructMatchCount(*mol, *query, params);
CHECK(count == matches.size());
params.maxMatches = 1;
const auto matchesCapped = SubstructMatch(*mol, *query, params);
const auto countCapped = SubstructMatchCount(*mol, *query, params);
CHECK(matchesCapped.size() == 1);
CHECK(countCapped == 1);
}
{
// Uniquify=false should still match counts with the full match materializer.
// (We don't assert an exact number here because it depends on automorphisms.)
auto mol = "c1ccccc1"_smiles;
REQUIRE(mol);
std::unique_ptr<ROMol> query{SmartsToMol("c:c")};
REQUIRE(query);
SubstructMatchParameters params;
params.uniquify = false;
const auto matches = SubstructMatch(*mol, *query, params);
const auto count = SubstructMatchCount(*mol, *query, params);
CHECK(count == matches.size());
}
{
// Simple stereochem case to make sure chirality-related final checking is
// consistent.
auto mol = "C[C@H](F)Cl"_smiles;
REQUIRE(mol);
std::unique_ptr<ROMol> query{SmartsToMol("[C@H](F)Cl")};
REQUIRE(query);
SubstructMatchParameters params;
params.useChirality = true;
const auto matches = SubstructMatch(*mol, *query, params);
const auto count = SubstructMatchCount(*mol, *query, params);
CHECK(count == matches.size());
}
}