Files
rdkit/Code/GraphMol/RGroupDecomposition/RGroupMatch.h
Paolo Tosco 786393beb1 Enable RGD highlights as in blog post (#7322)
* Code/GraphMol/Depictor/RDDepictor.h
- fixed typo in docstring
Code/GraphMol/RGroupDecomposition/RGroupCore.cpp
- added a missing const; formatting changes
Code/GraphMol/RGroupDecomposition/RGroupData.cpp, Code/GraphMol/RGroupDecomposition/RGroupData.h
- moved the code which merges disconnected R groups sharing the same attachment point into a single combined molecule to a private method, RGroupData::mergeIntoCombinedMol(). The method also includes logic to merge atom and bond highlights, if present.
- modernized a for loop
- isMolHydrogen is now a static function since it does not actually require any instance data
- implemented three static function to return the R group, Core and Mol labels, respectively
Code/GraphMol/RGroupDecomposition/RGroupDecomp.cpp, Code/GraphMol/RGroupDecomposition/RGroupDecomp.h
- implemented two private methods, RGroupDecomposition::labelAtomBondIndices() and RGroupDecomposition::setTargetAtomBondIndices(). The first method tags all atoms and bonds in the target molecule such that they can be tracked following core removal by RDKit::replaceCore(). The second method sets common_properties::_rgroupTargetAtoms and common_properties::_rgroupTargetBonds properties on core and R groups. These are vectors of atom and bond indices in the target molecule corresponding to core and R group atom/bonds, respectively, and can be used for color-coding the target molecule according to the R group decomposition it was subjected to, similarly to https://greglandrum.github.io/rdkit-blog/posts/2021-08-07-rgd-and-highlighting.html
Code/GraphMol/RGroupDecomposition/RGroupDecompData.cpp
- formatting changes and for loop modernization
Code/GraphMol/RGroupDecomposition/RGroupDecompParams.cpp, Code/GraphMol/RGroupDecomposition/RGroupDecompParams.h
- implemented updateRGroupDecompositionParametersFromJSON()
- added includeTargetMolInResults boolean parameter
Code/GraphMol/RGroupDecomposition/RGroupMatch.h
- implemented RGroupMatch::setTargetMoleculeForHighlights() and RGroupMatch::getTargetMoleculeForHighlights() methods to, respectively set and get the target molecule where R group decomposition can be color-coded with highlights. This molecule includes the explicit H atoms corresponding to extracted R groups, if any.
Code/GraphMol/RGroupDecomposition/Wrap/rdRGroupComposition.cpp
- use a std::unique_ptr to store the pointer to the C++ RGroupDecomposition instance
- fixed typos in docstrings
Code/GraphMol/RGroupDecomposition/Wrap/test_rgroups.py
- added test for the new includeTargetMolInResults parameter
Code/GraphMol/RGroupDecomposition/catch_rgd.cpp
- added test for the new includeTargetMolInResults parameter
Code/GraphMol/RGroupDecomposition/testRGroupDecomp.cpp
- formatting changes
Code/GraphMol/RGroupDecomposition/testRGroupInternals.cpp
- do not use deprecated constant
Code/MinimalLib/CMakeLists.txt
- added RDK_BUILD_MINIMAL_LIB_RGROUPDECOMP CMake flag to optionally expose R group decomposition functionality into MinimalLib
Code/MinimalLib/common.h
- added makeDummiesQueries flag to mol_from_input() (defaults to false)
- implemented parse_highlight_multi_colors() function to parse multi-color atom and bond highlights
- enable multi-color atom and bond highlighting
Code/MinimalLib/demo/rgd_demo.html
- added HTML page showcasing the multi-color highlights similarly to https://greglandrum.github.io/rdkit-blog/posts/2021-08-07-rgd-and-highlighting.html
Code/MinimalLib/jswrapper.cpp
- removed checks for non-nullness of d_mol as d_mol cannot be directly accessed anymore
- replaced all instances of d_mol with get()
- implemented support for multi-color atom and bond highlights
- implemented optional support for R group decomposition
- added JSMol::copy() convenience method with same functionality as get_mol_copy() to duplicate a molecule
Code/MinimalLib/minilib.cpp, Code/MinimalLib/minilib.h
- replaced all occurrences of d_mol with get(), as d_mol is now private
- removed all occurrences of assert(d_mol) as non-nullness is checked at construction time and whenever get() is called
- JSMol is now split into two subbclasses, JSMolUnique and JSMolShared, which both inherit from the JSMol base class. JSMolUnique can be constructed from a RWMol* (as the old JSMol), while JSMolShared can be constructed from a ROMOL_SPTR. This avoids unnecessary copies when wrapping a ROMOL_SPTR (e.g., from subtructure library, JSMolList or R group decomposition) into a JSMol to pass it to JS. This also avoids that modifications done in the JS layer on a molecule stored in a MolList (e.g., adding a property) are not persisted because they are carried out on a volatile copy of the molecule rather than on the actual molecule.
Code/MinimalLib/tests/tests.js
- added a test for pesistence of modifications made to JSSharedMol
- added tests for RGD
- added test for JSMol::copy()
Code/RDGeneral/RDValue.h
- removed trailing comma from vector properties such that they can be deserialized as syntactically correct JSON
Code/RDGeneral/types.cpp, Code/RDGeneral/types.h
- added _rgroupTargetAtoms and _rgroupTargetBonds common_properties

* Code/GraphMol/Depictor/RDDepictor.h
- fixed typo in docstring
Code/GraphMol/RGroupDecomposition/RGroupCore.cpp
- added a missing const; formatting changes
Code/GraphMol/RGroupDecomposition/RGroupData.cpp, Code/GraphMol/RGroupDecomposition/RGroupData.h
- moved the code which merges disconnected R groups sharing the same attachment point into a single combined molecule to a private method, RGroupData::mergeIntoCombinedMol(). The method also includes logic to merge atom and bond highlights, if present.
- modernized a for loop
- isMolHydrogen is now a static function since it does not actually require any instance data
- implemented three static function to return the R group, Core and Mol labels, respectively
Code/GraphMol/RGroupDecomposition/RGroupDecomp.cpp, Code/GraphMol/RGroupDecomposition/RGroupDecomp.h
- implemented two private methods, RGroupDecomposition::labelAtomBondIndices() and RGroupDecomposition::setTargetAtomBondIndices(). The first method tags all atoms and bonds in the target molecule such that they can be tracked following core removal by RDKit::replaceCore(). The second method sets common_properties::_rgroupTargetAtoms and common_properties::_rgroupTargetBonds properties on core and R groups. These are vectors of atom and bond indices in the target molecule corresponding to core and R group atom/bonds, respectively, and can be used for color-coding the target molecule according to the R group decomposition it was subjected to, similarly to https://greglandrum.github.io/rdkit-blog/posts/2021-08-07-rgd-and-highlighting.html
Code/GraphMol/RGroupDecomposition/RGroupDecompData.cpp
- formatting changes and for loop modernization
Code/GraphMol/RGroupDecomposition/RGroupDecompParams.cpp, Code/GraphMol/RGroupDecomposition/RGroupDecompParams.h
- implemented updateRGroupDecompositionParametersFromJSON()
- added includeTargetMolInResults boolean parameter
Code/GraphMol/RGroupDecomposition/RGroupMatch.h
- implemented RGroupMatch::setTargetMoleculeForHighlights() and RGroupMatch::getTargetMoleculeForHighlights() methods to, respectively set and get the target molecule where R group decomposition can be color-coded with highlights. This molecule includes the explicit H atoms corresponding to extracted R groups, if any.
Code/GraphMol/RGroupDecomposition/Wrap/rdRGroupComposition.cpp
- use a std::unique_ptr to store the pointer to the C++ RGroupDecomposition instance
- fixed typos in docstrings
Code/GraphMol/RGroupDecomposition/Wrap/test_rgroups.py
- added test for the new includeTargetMolInResults parameter
Code/GraphMol/RGroupDecomposition/catch_rgd.cpp
- added test for the new includeTargetMolInResults parameter
Code/GraphMol/RGroupDecomposition/testRGroupDecomp.cpp
- formatting changes
Code/GraphMol/RGroupDecomposition/testRGroupInternals.cpp
- do not use deprecated constant
Code/MinimalLib/CMakeLists.txt
- added RDK_BUILD_MINIMAL_LIB_RGROUPDECOMP CMake flag to optionally expose R group decomposition functionality into MinimalLib
Code/MinimalLib/common.h
- added makeDummiesQueries flag to mol_from_input() (defaults to false)
- implemented parse_highlight_multi_colors() function to parse multi-color atom and bond highlights
- enable multi-color atom and bond highlighting
Code/MinimalLib/demo/rgd_demo.html
- added HTML page showcasing the multi-color highlights similarly to https://greglandrum.github.io/rdkit-blog/posts/2021-08-07-rgd-and-highlighting.html
Code/MinimalLib/jswrapper.cpp
- removed checks for non-nullness of d_mol as d_mol cannot be directly accessed anymore
- replaced all instances of d_mol with get()
- implemented support for multi-color atom and bond highlights
- implemented optional support for R group decomposition
- JSMol is now split into two subbclasses, JSMol and JSMolShared, which both inherit from the JSMolBase class. While JSMol can be constructed from a RWMol* as usual, JSMolShared can be constructed from a ROMOL_SPTR. This avoids unnecessary copies when wrapping a ROMOL_SPTR (e.g., from subtructure library, JSMolList or R group decomposition) into a JSMol to pass it to JS. This also avoids that modifications done in the JS layer on a molecule stored in a MolList (e.g., adding a property) are not persisted because they are carried out on a volatile copy of the molecule rather than on the actual molecule.
- added JSMolBase::copy() convenience method with same functionality as get_mol_copy() to duplicate a molecule
Code/MinimalLib/minilib.cpp, Code/MinimalLib/minilib.h
- replaced all occurrences of d_mol with get(), as d_mol is now private
- removed all occurrences of assert(d_mol) as non-nullness is checked at construction time and whenever get() is called
Code/MinimalLib/tests/tests.js
- added a test for pesistence of modifications made to JSMolShared
- added tests for RGD
- added test for JSMolBase::copy()
Code/RDGeneral/RDValue.h
- removed trailing comma from vector properties such that they can be deserialized as syntactically correct JSON
Code/RDGeneral/types.cpp, Code/RDGeneral/types.h
- added _rgroupTargetAtoms and _rgroupTargetBonds common_properties

* added assignChiralTypesFromMolParity flag

* added test for makeDummiesQueries

* added CFFI tests

* reordered tests

* re-added piece of code that had gone accidentally lost while merging conflicts

* Removed CHECK_INVARIANT following code review

---------

Co-authored-by: ptosco <paolo.tosco@novartis.com>
2024-09-16 16:14:13 +02:00

159 lines
6.8 KiB
C++

//
// Copyright (C) 2017 Novartis Institutes for BioMedical Research
//
// @@ All Rights Reserved @@
// This file is part of the RDKit.
// The contents are covered by the terms of the BSD license
// which is included in the file license.txt, found at the root
// of the RDKit source tree.
//
#ifndef RGROUP_MATCH_DATA
#define RGROUP_MATCH_DATA
#include "RGroupData.h"
namespace RDKit {
typedef boost::shared_ptr<RGroupData> RData;
typedef std::map<int, RData> R_DECOMP;
//! RGroupMatch is the decomposition for a single molecule
struct RGroupMatch {
public:
size_t core_idx; // index of the matching core
size_t numberMissingUserRGroups;
R_DECOMP rgroups; // rlabel->RGroupData mapping
RWMOL_SPTR matchedCore; // Core with dummy or query atoms and bonds matched
RGroupMatch(size_t core_index, size_t numberMissingUserRGroups,
R_DECOMP input_rgroups, RWMOL_SPTR matchedCore)
: core_idx(core_index),
numberMissingUserRGroups(numberMissingUserRGroups),
rgroups(std::move(input_rgroups)),
matchedCore(std::move(matchedCore)) {}
std::string toString() const {
auto rGroupsString = std::accumulate(
rgroups.cbegin(), rgroups.cend(), std::string(),
[](std::string s, const std::pair<int, RData> &rgroup) {
return std::move(s) + "\n\t(" + std::to_string(rgroup.first) + ':' +
rgroup.second->toString() + ')';
});
std::stringstream ss;
ss << "Match coreIdx " << core_idx << " missing count "
<< numberMissingUserRGroups << " " << rGroupsString;
return ss.str();
}
//! Set the target molecule to be used for highlighting R groups
//! \param targetMol the target molecule
void setTargetMoleculeForHighlights(const RWMOL_SPTR &targetMol) {
targetMolForHighlights = targetMol;
targetMolWasTrimmed = false;
}
//! Get the target molecule to be used for highlighting R groups
//! \param trimHs whether explicit hydrogens should be removed,
//! except for those corresponding to R groups (if any)
//! \return the target molecule (can be null if it was never set)
RWMOL_SPTR getTargetMoleculeForHighlights(bool trimHs) {
if (!targetMolForHighlights || !trimHs || targetMolWasTrimmed) {
return targetMolForHighlights;
}
// if trimHs is true and this has not been done before, we need
// to remove explicit Hs, except for those corresponding to R groups
// (if any). Removal of hydrogens will change atom and bond indices,
// therefore common_properties::_rgroupTargetAtoms and
// common_properties::_rgroupTargetBonds need to be updated
int numMolAtoms = targetMolForHighlights->getNumAtoms();
std::vector<int> storedAtomMapNums(numMolAtoms);
std::vector<std::pair<int, int>> oldBondEnds(
targetMolForHighlights->getNumBonds(), std::make_pair(-1, -1));
auto atoms = targetMolForHighlights->atoms();
// we use atom map numbers to track original atom indices
// ahead of removing Hs, so we store existing values to be able
// to restore them afterwards
std::transform(atoms.begin(), atoms.end(), storedAtomMapNums.begin(),
[](auto atom) {
auto res = atom->getAtomMapNum();
atom->setAtomMapNum(0);
return res;
});
for (const auto &pair : rgroups) {
auto &combinedMol = pair.second->combinedMol;
std::vector<int> bondIndices;
if (combinedMol->getPropIfPresent(common_properties::_rgroupTargetBonds,
bondIndices)) {
std::for_each(bondIndices.begin(), bondIndices.end(),
[this, &oldBondEnds](const auto &bondIdx) {
const auto bond =
targetMolForHighlights->getBondWithIdx(bondIdx);
const auto beginAtom = bond->getBeginAtom();
const auto endAtom = bond->getEndAtom();
oldBondEnds[bondIdx].first = beginAtom->getIdx();
oldBondEnds[bondIdx].second = endAtom->getIdx();
beginAtom->setAtomMapNum(beginAtom->getIdx() + 1);
endAtom->setAtomMapNum(endAtom->getIdx() + 1);
});
}
}
// remove Hs except those involved in R groups
std::vector<int> oldToNewAtomIndices(numMolAtoms, -1);
MolOps::RemoveHsParameters rhps;
rhps.removeMapped = false;
MolOps::removeHs(*targetMolForHighlights, rhps);
for (auto atom : targetMolForHighlights->atoms()) {
auto atomMapNum = atom->getAtomMapNum();
if (atomMapNum) {
--atomMapNum;
oldToNewAtomIndices[atomMapNum] = atom->getIdx();
atom->setAtomMapNum(storedAtomMapNums.at(atomMapNum));
}
}
// update atom and bond indices after removing Hs
for (const auto &pair : rgroups) {
auto &combinedMol = pair.second->combinedMol;
std::vector<int> atomIndices;
if (combinedMol->getPropIfPresent(common_properties::_rgroupTargetAtoms,
atomIndices)) {
std::transform(
atomIndices.begin(), atomIndices.end(), atomIndices.begin(),
[&oldToNewAtomIndices](auto &atomIdx) {
auto newAtomIdx = oldToNewAtomIndices.at(atomIdx);
CHECK_INVARIANT(newAtomIdx != -1, "newAtomIdx must be >=0");
return newAtomIdx;
});
}
combinedMol->setProp(common_properties::_rgroupTargetAtoms, atomIndices);
std::vector<int> bondIndices;
if (combinedMol->getPropIfPresent(common_properties::_rgroupTargetBonds,
bondIndices)) {
std::transform(
bondIndices.begin(), bondIndices.end(), bondIndices.begin(),
[this, &oldBondEnds, &oldToNewAtomIndices](auto &bondIdx) {
const auto &oldPair = oldBondEnds.at(bondIdx);
CHECK_INVARIANT(oldPair.first != -1 && oldPair.second != -1,
"oldPair members must be >=0");
const auto newBeginAtomIdx =
oldToNewAtomIndices.at(oldPair.first);
const auto newEndAtomIdx = oldToNewAtomIndices.at(oldPair.second);
CHECK_INVARIANT(newBeginAtomIdx != -1 && newEndAtomIdx != -1,
"newBeginAtomIdx and newEndAtomIdx must be >=0");
const auto bond = targetMolForHighlights->getBondBetweenAtoms(
newBeginAtomIdx, newEndAtomIdx);
CHECK_INVARIANT(bond, "bond must not be null");
return bond->getIdx();
});
}
combinedMol->setProp(common_properties::_rgroupTargetBonds, bondIndices);
}
targetMolWasTrimmed = true;
return targetMolForHighlights;
}
private:
bool targetMolWasTrimmed = false;
RWMOL_SPTR targetMolForHighlights;
};
} // namespace RDKit
#endif