mirror of
https://github.com/rdkit/rdkit.git
synced 2026-06-03 21:44:30 +08:00
* Speed up tautomer canonicalization by deferring on SSSR calc * Lazy kekulization for tautomer enumeration Defer kekulization of tautomers until they are actually needed for transform matching. This avoids creating kekulized copies for: 1. The initial tautomer (until first iteration) 2. New tautomers that may never be processed (if enumeration ends early) The Tautomer class now supports lazy initialization of the kekulized form via getKekulized() method. Performance improvement: ~7% additional speedup (total ~22-24% from baseline) * Use count-only substructure matching in tautomer scoring * Add SubstructMatchCount regression test * MolStandardize: reduce enumerate overhead * MolStandardize: avoid per-tautomer ring recomputation * Atom: cache PeriodicTable pointer in valence calcs * Atom: reuse PeriodicTable in getEffectiveAtomicNum * PeriodicTable: add atomic fast path for getTable * GraphMol: reduce ROMol copy reallocations * MolStandardize: use quickCopy for per-match product copies Use RWMol(*kmol, true) in tautomer enumeration to avoid copying properties/bookmarks/conformers for each candidate. This reduces deep-copy overhead without changing chemistry. * MolStandardize: pre-filter scoring patterns by element/connectivity For tautomer scoring, pre-compute which SubstructTerms are relevant for a given input molecule. Since tautomerization only moves H atoms and changes bond orders (never creates/destroys heavy-atom bonds), patterns requiring missing elements or connectivity can be skipped for all tautomers of that molecule. Two-stage filtering: 1. Element check: skip patterns requiring atoms not in the molecule 2. Connectivity check: skip patterns whose bond-order-agnostic structure doesn't match the input molecule's connectivity This reduces the number of VF2 substructure calls per tautomer from 12 to typically 3-5, depending on the molecule's composition. * MolStandardize: preserve molecule properties for canonical tautomer Copy molecule properties from the original input to the canonical tautomer result. Since quickCopy during enumeration skips d_props to avoid overhead, extended SMILES data like link nodes (LN) was lost. This restores them on the final result. * TautomerQuery: preserve molecule properties (e.g. link nodes) in tautomers TautomerQuery::fromMol() uses TautomerEnumerator::enumerate() which uses quickCopy for performance. This doesn't copy molecule properties like _molLinkNodes. Without this fix, XQMol output would lose link node extensions in the SMILES. Copy properties from the original query molecule to all enumerated tautomers before constructing the TautomerQuery. This preserves extended SMILES data without impacting enumeration performance. * MolStandardize: use parallel iteration and cache bond lookups Replace O(n) getAtomWithIdx/getBondWithIdx calls with parallel iteration over atom/bond ranges in canonicalizeInPlace and enumerate. Cache bond lookups in setTautomerStereoAndIsoHs to avoid repeated O(n) searches. * perf: add specialized matchers for simple tautomer scoring patterns Replace VF2 graph matching with O(n) loops for 6 simple patterns: - countDoubleOrAromaticBonds: C=O, N=O, P=O patterns - countMethyls: [CX4H3] methyl groups - countCarbonDoubleHetero: [C]=[/home/dcvuser/rdkit;Code/GraphMol/MolStandardize/Tautomer.h] aliphatic C=hetero - countAromaticCarbonExocyclicN: [c]=aromatic C=exocyclic N Complex patterns (benzoquinone, oxim, guanidine, aci-nitro) still use VF2. Combined with the pre-filtering optimization, this achieves ~3.7x speedup (~2500ms vs ~9300ms original) for tautomer canonicalization. * Fix tautomer canonicalize dropping conformers from quickCopy quickCopy (RWMol(*mol, true)) skips conformers, so tautomer enumeration products lose 2D/3D coordinates. This causes InChI generation to omit the /b (double bond E/Z stereo) layer, since E/Z is derived from atomic coordinates. Fix: copy conformers from the original molecule onto the canonical tautomer after pickCanonical in TautomerEnumerator::canonicalize(). Tests: SMILES-based E/Z check in testTautomer.cpp, molblock-based conformer preservation check in catch_tests.cpp. * add test on canonicalize losing stereo * add regression test for exocyclic C=C tautomer canonicalization The getTautomerStateKey() pre-filter (commit 2595ef748) can falsely deduplicate distinct tautomers when their atom-index-ordered state patterns happen to match, leading canonicalize() to pick the wrong canonical form for molecules with STEREOTRANS-pinned exocyclic C=C bonds after RemoveHs. Test verifies that O=C(CC1=CC2=CC=COC2)NC1=O canonicalizes to the exocyclic form O=C1CC(=CC2=CC=COC2)C(=O)N1, not the endocyclic form O=C1C=C(C=C2CC=COC2)C(=O)N1. Currently expected to FAIL until the state key dedup bug is fixed. * MolStandardize: expand tautomer connectivity SMARTS * MolStandardize: scope tautomer pattern enum * MolStandardize: trim tautomer pattern enum * MolStandardize: use symmetric ring scoring
782 lines
24 KiB
C++
782 lines
24 KiB
C++
//
|
|
// Copyright (C) 2001-2025 Greg Landrum and other RDKit contributors
|
|
//
|
|
// @@ All Rights Reserved @@
|
|
// This file is part of the RDKit.
|
|
// The contents are covered by the terms of the BSD license
|
|
// which is included in the file license.txt, found at the root
|
|
// of the RDKit source tree.
|
|
//
|
|
#include <RDGeneral/utils.h>
|
|
#include <RDGeneral/Invariant.h>
|
|
#include <RDGeneral/RDThreads.h>
|
|
#include <GraphMol/RDKitBase.h>
|
|
#include <GraphMol/RDKitQueries.h>
|
|
#include <GraphMol/Resonance.h>
|
|
#include <GraphMol/MolBundle.h>
|
|
#include <GraphMol/Chirality.h>
|
|
|
|
#include "SubstructMatch.h"
|
|
#include "SubstructUtils.h"
|
|
#include <GraphMol/GenericGroups/GenericGroups.h>
|
|
#include <boost/smart_ptr.hpp>
|
|
#include <map>
|
|
#include <span>
|
|
|
|
#ifdef RDK_BUILD_THREADSAFE_SSS
|
|
#include <mutex>
|
|
#include <thread>
|
|
#include <future>
|
|
#endif
|
|
|
|
#include "vf2.hpp"
|
|
|
|
namespace RDKit {
|
|
namespace detail {
|
|
|
|
namespace {
|
|
bool hasChiralLabel(const Atom *at) {
|
|
PRECONDITION(at, "bad atom");
|
|
return at->getChiralTag() == Atom::CHI_TETRAHEDRAL_CW ||
|
|
at->getChiralTag() == Atom::CHI_TETRAHEDRAL_CCW;
|
|
}
|
|
|
|
bool enhancedStereoIsOK(
|
|
const ROMol &mol, const ROMol &query,
|
|
std::unordered_map<unsigned int, unsigned int> &q_to_mol,
|
|
const std::unordered_map<unsigned int, StereoGroup const *>
|
|
&molStereoGroups,
|
|
const std::unordered_map<unsigned int, bool> &matches) {
|
|
std::unordered_map<unsigned int, StereoGroup const *> molAtomsToQueryGroups;
|
|
|
|
// If the query has stereo groups:
|
|
// * OR only matches AND or OR (not absolute)
|
|
// * AND only matches OR
|
|
for (const auto &sg : query.getStereoGroups()) {
|
|
if (sg.getGroupType() == StereoGroupType::STEREO_ABSOLUTE) {
|
|
continue;
|
|
}
|
|
// StereoGroup const* matched_mol_group = nullptr;
|
|
const bool is_and = sg.getGroupType() == StereoGroupType::STEREO_AND;
|
|
for (const auto a : sg.getAtoms()) {
|
|
const auto mol_group = molStereoGroups.find(q_to_mol[a->getIdx()]);
|
|
if (mol_group == molStereoGroups.end()) {
|
|
// group matching absolute. not ok.
|
|
return false;
|
|
} else if (is_and && mol_group->second->getGroupType() !=
|
|
StereoGroupType::STEREO_AND) {
|
|
// AND matching OR. not ok.
|
|
return false;
|
|
}
|
|
|
|
molAtomsToQueryGroups[q_to_mol[a->getIdx()]] = &sg;
|
|
}
|
|
}
|
|
|
|
// If the mol has stereo groups:
|
|
// * All atoms must either be the same or opposite, you can't mix
|
|
// * Only one stereogroup must cover all matched atoms in the mol stereo group
|
|
for (const auto &sg : mol.getStereoGroups()) {
|
|
if (sg.getGroupType() == StereoGroupType::STEREO_ABSOLUTE) {
|
|
continue;
|
|
}
|
|
bool doesMatch = false;
|
|
bool seen = false;
|
|
StereoGroup const *QGroup = nullptr;
|
|
|
|
for (const auto &a : sg.getAtoms()) {
|
|
auto thisDoesMatch = matches.find(a->getIdx());
|
|
if (thisDoesMatch == matches.end()) {
|
|
// not matched
|
|
continue;
|
|
}
|
|
|
|
auto pos = molAtomsToQueryGroups.find(a->getIdx());
|
|
auto thisQGroup =
|
|
pos == molAtomsToQueryGroups.end() ? nullptr : pos->second;
|
|
if (!seen) {
|
|
doesMatch = thisDoesMatch->second;
|
|
QGroup = thisQGroup;
|
|
seen = true;
|
|
} else if (doesMatch != thisDoesMatch->second) {
|
|
// diastereomer. not ok.
|
|
return false;
|
|
} else if (thisQGroup != QGroup) {
|
|
// mix of groups in query. not ok.
|
|
return false;
|
|
}
|
|
}
|
|
}
|
|
|
|
return true;
|
|
}
|
|
|
|
} // namespace
|
|
|
|
typedef std::map<unsigned int, QueryAtom::QUERYATOM_QUERY *> SUBQUERY_MAP;
|
|
|
|
typedef struct {
|
|
ResonanceMolSupplier &resMolSupplier;
|
|
const ROMol &query;
|
|
const SubstructMatchParameters ¶ms;
|
|
} ResSubstructMatchHelperArgs_;
|
|
|
|
void MatchSubqueries(const ROMol &mol, QueryAtom::QUERYATOM_QUERY *q,
|
|
const SubstructMatchParameters ¶ms,
|
|
SUBQUERY_MAP &subqueryMap,
|
|
std::vector<RecursiveStructureQuery *> &locked);
|
|
|
|
bool insertIfNeeded(std::set<MatchVectType> &matches, const MatchVectType &m) {
|
|
bool shouldInsert = true;
|
|
std::unordered_set<int> matchAsSet;
|
|
std::transform(m.begin(), m.end(),
|
|
std::inserter(matchAsSet, matchAsSet.begin()),
|
|
[](const std::pair<int, int> &p) { return p.second; });
|
|
for (auto it = matches.begin(); it != matches.end(); ++it) {
|
|
std::unordered_set<int> existingMatchAsSet;
|
|
std::transform(
|
|
it->begin(), it->end(),
|
|
std::inserter(existingMatchAsSet, existingMatchAsSet.begin()),
|
|
[](const std::pair<int, int> &p) { return p.second; });
|
|
if (matchAsSet == existingMatchAsSet) {
|
|
if (m < *it) {
|
|
matches.erase(it);
|
|
} else {
|
|
shouldInsert = false;
|
|
}
|
|
break;
|
|
}
|
|
}
|
|
if (shouldInsert) {
|
|
matches.insert(m);
|
|
}
|
|
return shouldInsert;
|
|
}
|
|
|
|
bool tryToInsert(std::set<MatchVectType> &matches, const MatchVectType &match,
|
|
const SubstructMatchParameters ¶ms) {
|
|
if (matches.size() == params.maxMatches) {
|
|
return false;
|
|
}
|
|
if (!params.uniquify) {
|
|
matches.insert(match);
|
|
} else {
|
|
insertIfNeeded(matches, match);
|
|
}
|
|
return true;
|
|
}
|
|
|
|
void ResSubstructMatchHelper_(const ResSubstructMatchHelperArgs_ &args,
|
|
std::set<MatchVectType> *matches, unsigned int bi,
|
|
unsigned int ei);
|
|
|
|
typedef std::vector<
|
|
std::pair<MolGraph::vertex_descriptor, MolGraph::vertex_descriptor>>
|
|
ssPairType;
|
|
|
|
} // namespace detail
|
|
|
|
MolMatchFinalCheckFunctor::MolMatchFinalCheckFunctor(
|
|
const ROMol &query, const ROMol &mol, const SubstructMatchParameters &ps)
|
|
: d_query(query), d_mol(mol), d_params(ps) {
|
|
if (d_params.useEnhancedStereo) {
|
|
for (const auto &sg : d_mol.getStereoGroups()) {
|
|
if (sg.getGroupType() == StereoGroupType::STEREO_ABSOLUTE) {
|
|
continue;
|
|
}
|
|
for (const auto a : sg.getAtoms()) {
|
|
d_molStereoGroups[a->getIdx()] = &sg;
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
bool MolMatchFinalCheckFunctor::operator()(const std::uint32_t q_c[],
|
|
const std::uint32_t m_c[]) {
|
|
if (d_params.extraFinalCheck || d_params.useGenericMatchers) {
|
|
const std::span<const std::uint32_t> aids(m_c, d_query.getNumAtoms());
|
|
if (d_params.useGenericMatchers &&
|
|
!GenericGroups::genericAtomMatcher(d_mol, d_query, aids)) {
|
|
return false;
|
|
}
|
|
if (d_params.extraFinalCheck && !d_params.extraFinalCheck(d_mol, aids)) {
|
|
return false;
|
|
}
|
|
}
|
|
|
|
HashedStorageType match;
|
|
if (d_params.uniquify) {
|
|
match.resize(d_mol.getNumAtoms());
|
|
#ifdef RDK_INTERNAL_BITSET_HAS_HASH
|
|
match.reset();
|
|
#else
|
|
std::fill(match.begin(), match.end(), 0);
|
|
#endif
|
|
for (unsigned int i = 0; i < d_query.getNumAtoms(); ++i) {
|
|
match[m_c[i]] = 1;
|
|
}
|
|
if (matchesSeen.find(match) != matchesSeen.end()) {
|
|
return false;
|
|
}
|
|
}
|
|
|
|
if (!d_params.useChirality) {
|
|
if (d_params.uniquify) {
|
|
matchesSeen.insert(match);
|
|
}
|
|
return true;
|
|
}
|
|
|
|
std::unordered_map<unsigned int, bool> matches;
|
|
|
|
// check chiral atoms:
|
|
for (unsigned int i = 0; i < d_query.getNumAtoms(); ++i) {
|
|
const Atom *qAt = d_query.getAtomWithIdx(q_c[i]);
|
|
|
|
// With less than 3 neighbors we can't establish CW/CCW parity,
|
|
// so query will be a match if it has any kind of chirality.
|
|
if (qAt->getDegree() < 3 || !detail::hasChiralLabel(qAt)) {
|
|
continue;
|
|
}
|
|
const Atom *mAt = d_mol.getAtomWithIdx(m_c[i]);
|
|
if (!detail::hasChiralLabel(mAt)) {
|
|
if (d_params.specifiedStereoQueryMatchesUnspecified) {
|
|
continue;
|
|
}
|
|
return false;
|
|
}
|
|
if (qAt->getDegree() > mAt->getDegree()) {
|
|
return false;
|
|
}
|
|
|
|
INT_LIST qOrder;
|
|
INT_LIST mOrder;
|
|
for (unsigned int j = 0; j < d_query.getNumAtoms(); ++j) {
|
|
const Bond *qB = d_query.getBondBetweenAtoms(q_c[i], q_c[j]);
|
|
const Bond *mB = d_mol.getBondBetweenAtoms(m_c[i], m_c[j]);
|
|
if (qB && mB) {
|
|
mOrder.push_back(mB->getIdx());
|
|
qOrder.push_back(qB->getIdx());
|
|
if (mOrder.size() == qAt->getDegree()) {
|
|
break;
|
|
}
|
|
}
|
|
}
|
|
CHECK_INVARIANT(qOrder.size() == qAt->getDegree(), "missing matches");
|
|
CHECK_INVARIANT(qOrder.size() == mOrder.size(), "bad matches");
|
|
int qPermCount = qAt->getPerturbationOrder(qOrder);
|
|
|
|
unsigned unmatchedNeighbors = mAt->getDegree() - mOrder.size();
|
|
mOrder.insert(mOrder.end(), unmatchedNeighbors, -1);
|
|
|
|
INT_LIST moOrder;
|
|
for (const auto &bond : d_mol.atomBonds(mAt)) {
|
|
const int dbidx = bond->getIdx();
|
|
if (std::find(mOrder.begin(), mOrder.end(), dbidx) != mOrder.end()) {
|
|
moOrder.push_back(dbidx);
|
|
} else {
|
|
moOrder.push_back(-1);
|
|
}
|
|
}
|
|
|
|
const int mPermCount =
|
|
static_cast<int>(countSwapsToInterconvert(moOrder, mOrder));
|
|
|
|
const bool requireMatch = qPermCount % 2 == mPermCount % 2;
|
|
const bool labelsMatch = qAt->getChiralTag() == mAt->getChiralTag();
|
|
const bool matchOK = requireMatch == labelsMatch;
|
|
|
|
// if this is not part of a stereogroup and doesn't match, return false
|
|
const auto msg = d_molStereoGroups.find(m_c[i]);
|
|
if (msg == d_molStereoGroups.end()) {
|
|
if (!matchOK) {
|
|
return false;
|
|
}
|
|
} else {
|
|
matches[m_c[i]] = matchOK;
|
|
}
|
|
}
|
|
|
|
std::unordered_map<unsigned int, unsigned int> q_to_mol;
|
|
for (unsigned int j = 0; j < d_query.getNumAtoms(); ++j) {
|
|
q_to_mol[q_c[j]] = m_c[j];
|
|
}
|
|
|
|
if (d_params.useEnhancedStereo) {
|
|
if (!detail::enhancedStereoIsOK(d_mol, d_query, q_to_mol, d_molStereoGroups,
|
|
matches)) {
|
|
return false;
|
|
}
|
|
}
|
|
|
|
// now check double bonds
|
|
for (const auto &qBnd : d_query.bonds()) {
|
|
if (qBnd->getBondType() != Bond::DOUBLE ||
|
|
qBnd->getStereo() <= Bond::STEREOANY) {
|
|
continue;
|
|
}
|
|
|
|
// don't think this can actually happen, but check to be sure:
|
|
if (qBnd->getStereoAtoms().size() != 2) {
|
|
continue;
|
|
}
|
|
|
|
const Bond *mBnd = d_mol.getBondBetweenAtoms(
|
|
q_to_mol[qBnd->getBeginAtomIdx()], q_to_mol[qBnd->getEndAtomIdx()]);
|
|
CHECK_INVARIANT(mBnd, "Matching bond not found");
|
|
if (mBnd->getBondType() != Bond::DOUBLE) {
|
|
continue;
|
|
}
|
|
|
|
if (!d_params.specifiedStereoQueryMatchesUnspecified &&
|
|
mBnd->getStereo() <= Bond::STEREOANY) {
|
|
return false;
|
|
}
|
|
|
|
// don't think this can actually happen, but check to be sure:
|
|
if (mBnd->getStereoAtoms().size() != 2) {
|
|
continue;
|
|
}
|
|
|
|
unsigned int end1Matches = 0;
|
|
unsigned int end2Matches = 0;
|
|
if (q_to_mol[qBnd->getBeginAtomIdx()] == mBnd->getBeginAtomIdx()) {
|
|
// query Begin == mol Begin
|
|
if (q_to_mol[qBnd->getStereoAtoms()[0]] ==
|
|
static_cast<unsigned>(mBnd->getStereoAtoms()[0])) {
|
|
end1Matches = 1;
|
|
}
|
|
if (q_to_mol[qBnd->getStereoAtoms()[1]] ==
|
|
static_cast<unsigned>(mBnd->getStereoAtoms()[1])) {
|
|
end2Matches = 1;
|
|
}
|
|
} else {
|
|
// query End == mol Begin
|
|
if (q_to_mol[qBnd->getStereoAtoms()[0]] ==
|
|
static_cast<unsigned>(mBnd->getStereoAtoms()[1])) {
|
|
end1Matches = 1;
|
|
}
|
|
if (q_to_mol[qBnd->getStereoAtoms()[1]] ==
|
|
static_cast<unsigned>(mBnd->getStereoAtoms()[0])) {
|
|
end2Matches = 1;
|
|
}
|
|
}
|
|
|
|
const unsigned totalMatches = end1Matches + end2Matches;
|
|
const auto mStereo =
|
|
Chirality::translateEZLabelToCisTrans(mBnd->getStereo());
|
|
const auto qStereo =
|
|
Chirality::translateEZLabelToCisTrans(qBnd->getStereo());
|
|
|
|
if (mStereo == qStereo && totalMatches == 1) {
|
|
return false;
|
|
}
|
|
if (mStereo != qStereo && totalMatches != 1) {
|
|
return false;
|
|
}
|
|
}
|
|
if (d_params.uniquify) {
|
|
matchesSeen.insert(match);
|
|
}
|
|
return true;
|
|
}
|
|
|
|
namespace detail {
|
|
|
|
class AtomLabelFunctor {
|
|
public:
|
|
AtomLabelFunctor(const ROMol &query, const ROMol &mol,
|
|
const SubstructMatchParameters &ps)
|
|
: d_query(query), d_mol(mol), d_params(ps) {};
|
|
|
|
bool operator()(unsigned int i, unsigned int j) const {
|
|
bool res = false;
|
|
if (d_params.useChirality) {
|
|
const Atom *qAt = d_query.getAtomWithIdx(i);
|
|
if (qAt->getChiralTag() == Atom::CHI_TETRAHEDRAL_CW ||
|
|
qAt->getChiralTag() == Atom::CHI_TETRAHEDRAL_CCW) {
|
|
const Atom *mAt = d_mol.getAtomWithIdx(j);
|
|
if (!d_params.specifiedStereoQueryMatchesUnspecified &&
|
|
mAt->getChiralTag() != Atom::CHI_TETRAHEDRAL_CW &&
|
|
mAt->getChiralTag() != Atom::CHI_TETRAHEDRAL_CCW) {
|
|
return false;
|
|
}
|
|
}
|
|
}
|
|
res = atomCompat(d_query[i], d_mol[j], d_params);
|
|
return res;
|
|
}
|
|
|
|
private:
|
|
const ROMol &d_query;
|
|
const ROMol &d_mol;
|
|
const SubstructMatchParameters &d_params;
|
|
};
|
|
class BondLabelFunctor {
|
|
public:
|
|
BondLabelFunctor(const ROMol &query, const ROMol &mol,
|
|
const SubstructMatchParameters &ps)
|
|
: d_query(query), d_mol(mol), d_params(ps) {};
|
|
bool operator()(MolGraph::edge_descriptor i,
|
|
MolGraph::edge_descriptor j) const {
|
|
if (d_params.useChirality) {
|
|
const Bond *qBnd = d_query[i];
|
|
if (qBnd->getBondType() == Bond::DOUBLE &&
|
|
qBnd->getStereo() > Bond::STEREOANY) {
|
|
const Bond *mBnd = d_mol[j];
|
|
if (mBnd->getBondType() == Bond::DOUBLE &&
|
|
!d_params.specifiedStereoQueryMatchesUnspecified &&
|
|
mBnd->getStereo() <= Bond::STEREOANY) {
|
|
return false;
|
|
}
|
|
}
|
|
}
|
|
bool res = bondCompat(d_query[i], d_mol[j], d_params);
|
|
return res;
|
|
}
|
|
|
|
private:
|
|
const ROMol &d_query;
|
|
const ROMol &d_mol;
|
|
const SubstructMatchParameters &d_params;
|
|
};
|
|
void ResSubstructMatchHelper_(const ResSubstructMatchHelperArgs_ &args,
|
|
std::set<MatchVectType> *matches, unsigned int bi,
|
|
unsigned int ei) {
|
|
for (unsigned int i = bi;
|
|
(matches->size() < args.params.maxMatches) && (i < ei); ++i) {
|
|
std::unique_ptr<ROMol> mol{args.resMolSupplier[i]};
|
|
std::vector<MatchVectType> matchesTmp =
|
|
SubstructMatch(*mol, args.query, args.params);
|
|
for (const auto &match : matchesTmp) {
|
|
if (!tryToInsert(*matches, match, args.params)) {
|
|
break;
|
|
}
|
|
}
|
|
}
|
|
};
|
|
|
|
struct RecursiveLocker {
|
|
std::vector<RecursiveStructureQuery *> locked;
|
|
RecursiveLocker(const ROMol &query, const bool recursionPossible) {
|
|
if (recursionPossible) {
|
|
locked.reserve(query.getNumAtoms());
|
|
}
|
|
}
|
|
|
|
~RecursiveLocker() {
|
|
for (auto v : locked) {
|
|
v->clear();
|
|
#ifdef RDK_BUILD_THREADSAFE_SSS
|
|
v->d_mutex.unlock();
|
|
#endif
|
|
}
|
|
}
|
|
};
|
|
|
|
// A minimal container which satisfies the vf2_all() output-sequence interface
|
|
// but only counts matches instead of storing them.
|
|
struct MatchCounter {
|
|
using value_type = ssPairType;
|
|
|
|
void clear() { d_count = 0; }
|
|
void resize(size_t) { d_count = 0; }
|
|
void reserve(size_t) {}
|
|
|
|
bool empty() const { return d_count == 0; }
|
|
size_t size() const { return d_count; }
|
|
|
|
void push_back(const value_type &) { ++d_count; }
|
|
|
|
private:
|
|
size_t d_count = 0;
|
|
};
|
|
} // namespace detail
|
|
|
|
// ----------------------------------------------
|
|
//
|
|
// find all matches
|
|
std::vector<MatchVectType> SubstructMatch(
|
|
const ROMol &mol, const ROMol &query,
|
|
const SubstructMatchParameters ¶ms) {
|
|
std::vector<MatchVectType> matches;
|
|
const auto &mNumAtoms = mol.getNumAtoms();
|
|
const auto &qNumAtoms = query.getNumAtoms();
|
|
if (!mNumAtoms || !qNumAtoms || qNumAtoms > mNumAtoms) {
|
|
return matches;
|
|
}
|
|
|
|
detail::RecursiveLocker locker(query, params.recursionPossible);
|
|
|
|
if (params.recursionPossible) {
|
|
detail::SUBQUERY_MAP subqueryMap;
|
|
ROMol::ConstAtomIterator atIt;
|
|
for (const auto atom : query.atoms()) {
|
|
if (atom->hasQuery()) {
|
|
// std::cerr<<"recurse from atom "<<(*atIt)->getIdx()<<std::endl;
|
|
detail::MatchSubqueries(mol, atom->getQuery(), params, subqueryMap,
|
|
locker.locked);
|
|
}
|
|
}
|
|
}
|
|
|
|
detail::AtomLabelFunctor atomLabeler(query, mol, params);
|
|
detail::BondLabelFunctor bondLabeler(query, mol, params);
|
|
MolMatchFinalCheckFunctor matchChecker(query, mol, params);
|
|
|
|
std::vector<detail::ssPairType> pms;
|
|
bool found =
|
|
boost::vf2_all(query.getTopology(), mol.getTopology(), atomLabeler,
|
|
bondLabeler, matchChecker, pms, params.maxMatches);
|
|
if (found) {
|
|
const unsigned int nQueryAtoms = query.getNumAtoms();
|
|
matches.reserve(pms.size());
|
|
MatchVectType matchVect(nQueryAtoms);
|
|
for (const auto &pairs : pms) {
|
|
for (const auto &pair : pairs) {
|
|
matchVect[pair.first] = pair;
|
|
}
|
|
matches.push_back(matchVect);
|
|
}
|
|
}
|
|
return matches;
|
|
}
|
|
|
|
unsigned int SubstructMatchCount(const ROMol &mol, const ROMol &query,
|
|
const SubstructMatchParameters ¶ms) {
|
|
if (!mol.getNumAtoms() || !query.getNumAtoms()) {
|
|
return 0;
|
|
}
|
|
|
|
detail::RecursiveLocker locker(query, params.recursionPossible);
|
|
|
|
if (params.recursionPossible) {
|
|
detail::SUBQUERY_MAP subqueryMap;
|
|
for (const auto atom : query.atoms()) {
|
|
if (atom->hasQuery()) {
|
|
detail::MatchSubqueries(mol, atom->getQuery(), params, subqueryMap,
|
|
locker.locked);
|
|
}
|
|
}
|
|
}
|
|
|
|
detail::AtomLabelFunctor atomLabeler(query, mol, params);
|
|
detail::BondLabelFunctor bondLabeler(query, mol, params);
|
|
MolMatchFinalCheckFunctor matchChecker(query, mol, params);
|
|
|
|
detail::MatchCounter counter;
|
|
boost::vf2_all(query.getTopology(), mol.getTopology(), atomLabeler,
|
|
bondLabeler, matchChecker, counter, params.maxMatches);
|
|
return static_cast<unsigned int>(counter.size());
|
|
}
|
|
|
|
std::vector<MatchVectType> SubstructMatch(
|
|
const MolBundle &bundle, const ROMol &query,
|
|
const SubstructMatchParameters ¶ms) {
|
|
std::vector<MatchVectType> res;
|
|
for (unsigned int i = 0; i < bundle.size() && res.empty(); ++i) {
|
|
res = SubstructMatch(*bundle[i], query, params);
|
|
}
|
|
return res;
|
|
}
|
|
|
|
std::vector<MatchVectType> SubstructMatch(
|
|
const ROMol &mol, const MolBundle &query,
|
|
const SubstructMatchParameters ¶ms) {
|
|
std::vector<MatchVectType> res;
|
|
for (unsigned int i = 0; i < query.size() && res.empty(); ++i) {
|
|
res = SubstructMatch(mol, *query[i], params);
|
|
}
|
|
return res;
|
|
}
|
|
|
|
std::vector<MatchVectType> SubstructMatch(
|
|
const MolBundle &mol, const MolBundle &query,
|
|
const SubstructMatchParameters ¶ms) {
|
|
std::vector<MatchVectType> res;
|
|
for (unsigned int i = 0; i < mol.size() && res.empty(); ++i) {
|
|
for (unsigned int j = 0; j < query.size() && res.empty(); ++j) {
|
|
res = SubstructMatch(*mol[i], *query[j], params);
|
|
}
|
|
}
|
|
return res;
|
|
}
|
|
|
|
// ----------------------------------------------
|
|
//
|
|
// find all matches in a ResonanceMolSupplier object
|
|
//
|
|
//
|
|
std::vector<MatchVectType> SubstructMatch(
|
|
ResonanceMolSupplier &resMolSupplier, const ROMol &query,
|
|
const SubstructMatchParameters ¶ms) {
|
|
std::set<MatchVectType> matches;
|
|
detail::ResSubstructMatchHelperArgs_ args = {resMolSupplier, query, params};
|
|
unsigned int nt =
|
|
std::min(resMolSupplier.length(), getNumThreadsToUse(params.numThreads));
|
|
if (nt == 1) {
|
|
detail::ResSubstructMatchHelper_(args, &matches, 0,
|
|
resMolSupplier.length());
|
|
}
|
|
#ifdef RDK_BUILD_THREADSAFE_SSS
|
|
else {
|
|
std::vector<std::future<void>> tg;
|
|
std::vector<std::unique_ptr<std::set<MatchVectType>>> matchesThread(nt);
|
|
unsigned int ei = 0;
|
|
double dpt =
|
|
static_cast<double>(resMolSupplier.length()) / static_cast<double>(nt);
|
|
double dc = 0.0;
|
|
for (unsigned int ti = 0; ti < nt; ++ti) {
|
|
matchesThread[ti] = std::make_unique<std::set<MatchVectType>>();
|
|
unsigned int bi = ei;
|
|
dc += dpt;
|
|
ei = static_cast<unsigned int>(floor(dc));
|
|
tg.emplace_back(std::async(std::launch::async,
|
|
detail::ResSubstructMatchHelper_, args,
|
|
matchesThread[ti].get(), bi, ei));
|
|
}
|
|
for (auto &fut : tg) {
|
|
fut.get();
|
|
}
|
|
|
|
for (unsigned int ti = 0; ti < nt; ++ti) {
|
|
for (const auto &match : *matchesThread[ti]) {
|
|
if (!detail::tryToInsert(matches, match, args.params)) {
|
|
break;
|
|
}
|
|
}
|
|
}
|
|
}
|
|
#endif
|
|
return std::vector<MatchVectType>(matches.begin(), matches.end());
|
|
}
|
|
|
|
namespace detail {
|
|
unsigned int RecursiveMatcher(const ROMol &mol, const ROMol &query,
|
|
std::vector<int> &matches,
|
|
SUBQUERY_MAP &subqueryMap,
|
|
const SubstructMatchParameters ¶ms,
|
|
std::vector<RecursiveStructureQuery *> &locked) {
|
|
SubstructMatchParameters lparams = params;
|
|
lparams.maxMatches = std::max(params.maxRecursiveMatches, params.maxMatches);
|
|
lparams.uniquify = false;
|
|
for (auto qAtom : query.atoms()) {
|
|
if (qAtom->hasQuery()) {
|
|
MatchSubqueries(mol, qAtom->getQuery(), lparams, subqueryMap, locked);
|
|
}
|
|
}
|
|
|
|
detail::AtomLabelFunctor atomLabeler(query, mol, lparams);
|
|
detail::BondLabelFunctor bondLabeler(query, mol, lparams);
|
|
MolMatchFinalCheckFunctor matchChecker(query, mol, lparams);
|
|
|
|
matches.clear();
|
|
matches.resize(0);
|
|
std::vector<detail::ssPairType> pms;
|
|
bool found =
|
|
boost::vf2_all(query.getTopology(), mol.getTopology(), atomLabeler,
|
|
bondLabeler, matchChecker, pms, lparams.maxMatches);
|
|
unsigned int res = 0;
|
|
if (found) {
|
|
matches.reserve(pms.size());
|
|
for (const auto &pairs : pms) {
|
|
if (!query.hasProp(common_properties::_queryRootAtom)) {
|
|
matches.push_back(pairs.begin()->second);
|
|
} else {
|
|
int rootIdx;
|
|
query.getProp(common_properties::_queryRootAtom, rootIdx);
|
|
bool found = false;
|
|
for (const auto &pairIter : pairs) {
|
|
if (pairIter.first == static_cast<unsigned int>(rootIdx)) {
|
|
matches.push_back(pairIter.second);
|
|
found = true;
|
|
break;
|
|
}
|
|
}
|
|
if (!found) {
|
|
BOOST_LOG(rdErrorLog)
|
|
<< "no match found for queryRootAtom" << std::endl;
|
|
}
|
|
}
|
|
if (matches.size() == lparams.maxMatches) {
|
|
break;
|
|
}
|
|
}
|
|
res = matches.size();
|
|
}
|
|
// std::cout << " <<< RecursiveMatcher: " << int(query) << std::endl;
|
|
return res;
|
|
}
|
|
|
|
void MatchSubqueries(const ROMol &mol, QueryAtom::QUERYATOM_QUERY *query,
|
|
const SubstructMatchParameters ¶ms,
|
|
SUBQUERY_MAP &subqueryMap,
|
|
std::vector<RecursiveStructureQuery *> &locked) {
|
|
PRECONDITION(query, "bad query");
|
|
if (query->getDescription() == "RecursiveStructure") {
|
|
auto *rsq = (RecursiveStructureQuery *)query;
|
|
#ifdef RDK_BUILD_THREADSAFE_SSS
|
|
rsq->d_mutex.lock();
|
|
#endif
|
|
locked.push_back(rsq);
|
|
rsq->clear();
|
|
bool matchDone = false;
|
|
if (rsq->getSerialNumber() &&
|
|
subqueryMap.find(rsq->getSerialNumber()) != subqueryMap.end()) {
|
|
// we've matched an equivalent serial number before, just
|
|
// copy in the matches:
|
|
matchDone = true;
|
|
auto orsq =
|
|
(const RecursiveStructureQuery *)subqueryMap[rsq->getSerialNumber()];
|
|
for (auto setIter = orsq->beginSet(); setIter != orsq->endSet();
|
|
++setIter) {
|
|
rsq->insert(*setIter);
|
|
}
|
|
}
|
|
|
|
if (!matchDone) {
|
|
ROMol const *queryMol = rsq->getQueryMol();
|
|
// in case we are reusing this query, clear its contents now.
|
|
if (queryMol) {
|
|
std::vector<int> matchStarts;
|
|
unsigned int res = RecursiveMatcher(mol, *queryMol, matchStarts,
|
|
subqueryMap, params, locked);
|
|
if (res) {
|
|
for (int &matchStart : matchStarts) {
|
|
rsq->insert(matchStart);
|
|
}
|
|
}
|
|
}
|
|
if (rsq->getSerialNumber()) {
|
|
subqueryMap[rsq->getSerialNumber()] = query;
|
|
}
|
|
}
|
|
}
|
|
|
|
// now recurse over our children (these things can be nested)
|
|
for (auto childIt = query->beginChildren(); childIt != query->endChildren();
|
|
++childIt) {
|
|
MatchSubqueries(mol, childIt->get(), params, subqueryMap, locked);
|
|
}
|
|
// std::cout << "<<- back " << (int)query << std::endl;
|
|
}
|
|
|
|
} // end of namespace detail
|
|
|
|
bool AtomCoordsMatchFunctor::operator()(const Atom &queryAtom,
|
|
const Atom &targetAtom) const {
|
|
if (!queryAtom.getOwningMol().getNumConformers() ||
|
|
!targetAtom.getOwningMol().getNumConformers()) {
|
|
return false;
|
|
}
|
|
const auto &queryPos = queryAtom.getOwningMol()
|
|
.getConformer(d_queryConfId)
|
|
.getAtomPos(queryAtom.getIdx());
|
|
const auto &targetPos = targetAtom.getOwningMol()
|
|
.getConformer(d_refConfId)
|
|
.getAtomPos(targetAtom.getIdx());
|
|
return (queryPos - targetPos).lengthSq() <= d_tol2;
|
|
};
|
|
|
|
} // namespace RDKit
|