Enable RGD highlights as in blog post (#7322)

* Code/GraphMol/Depictor/RDDepictor.h
- fixed typo in docstring
Code/GraphMol/RGroupDecomposition/RGroupCore.cpp
- added a missing const; formatting changes
Code/GraphMol/RGroupDecomposition/RGroupData.cpp, Code/GraphMol/RGroupDecomposition/RGroupData.h
- moved the code which merges disconnected R groups sharing the same attachment point into a single combined molecule to a private method, RGroupData::mergeIntoCombinedMol(). The method also includes logic to merge atom and bond highlights, if present.
- modernized a for loop
- isMolHydrogen is now a static function since it does not actually require any instance data
- implemented three static function to return the R group, Core and Mol labels, respectively
Code/GraphMol/RGroupDecomposition/RGroupDecomp.cpp, Code/GraphMol/RGroupDecomposition/RGroupDecomp.h
- implemented two private methods, RGroupDecomposition::labelAtomBondIndices() and RGroupDecomposition::setTargetAtomBondIndices(). The first method tags all atoms and bonds in the target molecule such that they can be tracked following core removal by RDKit::replaceCore(). The second method sets common_properties::_rgroupTargetAtoms and common_properties::_rgroupTargetBonds properties on core and R groups. These are vectors of atom and bond indices in the target molecule corresponding to core and R group atom/bonds, respectively, and can be used for color-coding the target molecule according to the R group decomposition it was subjected to, similarly to https://greglandrum.github.io/rdkit-blog/posts/2021-08-07-rgd-and-highlighting.html
Code/GraphMol/RGroupDecomposition/RGroupDecompData.cpp
- formatting changes and for loop modernization
Code/GraphMol/RGroupDecomposition/RGroupDecompParams.cpp, Code/GraphMol/RGroupDecomposition/RGroupDecompParams.h
- implemented updateRGroupDecompositionParametersFromJSON()
- added includeTargetMolInResults boolean parameter
Code/GraphMol/RGroupDecomposition/RGroupMatch.h
- implemented RGroupMatch::setTargetMoleculeForHighlights() and RGroupMatch::getTargetMoleculeForHighlights() methods to, respectively set and get the target molecule where R group decomposition can be color-coded with highlights. This molecule includes the explicit H atoms corresponding to extracted R groups, if any.
Code/GraphMol/RGroupDecomposition/Wrap/rdRGroupComposition.cpp
- use a std::unique_ptr to store the pointer to the C++ RGroupDecomposition instance
- fixed typos in docstrings
Code/GraphMol/RGroupDecomposition/Wrap/test_rgroups.py
- added test for the new includeTargetMolInResults parameter
Code/GraphMol/RGroupDecomposition/catch_rgd.cpp
- added test for the new includeTargetMolInResults parameter
Code/GraphMol/RGroupDecomposition/testRGroupDecomp.cpp
- formatting changes
Code/GraphMol/RGroupDecomposition/testRGroupInternals.cpp
- do not use deprecated constant
Code/MinimalLib/CMakeLists.txt
- added RDK_BUILD_MINIMAL_LIB_RGROUPDECOMP CMake flag to optionally expose R group decomposition functionality into MinimalLib
Code/MinimalLib/common.h
- added makeDummiesQueries flag to mol_from_input() (defaults to false)
- implemented parse_highlight_multi_colors() function to parse multi-color atom and bond highlights
- enable multi-color atom and bond highlighting
Code/MinimalLib/demo/rgd_demo.html
- added HTML page showcasing the multi-color highlights similarly to https://greglandrum.github.io/rdkit-blog/posts/2021-08-07-rgd-and-highlighting.html
Code/MinimalLib/jswrapper.cpp
- removed checks for non-nullness of d_mol as d_mol cannot be directly accessed anymore
- replaced all instances of d_mol with get()
- implemented support for multi-color atom and bond highlights
- implemented optional support for R group decomposition
- added JSMol::copy() convenience method with same functionality as get_mol_copy() to duplicate a molecule
Code/MinimalLib/minilib.cpp, Code/MinimalLib/minilib.h
- replaced all occurrences of d_mol with get(), as d_mol is now private
- removed all occurrences of assert(d_mol) as non-nullness is checked at construction time and whenever get() is called
- JSMol is now split into two subbclasses, JSMolUnique and JSMolShared, which both inherit from the JSMol base class. JSMolUnique can be constructed from a RWMol* (as the old JSMol), while JSMolShared can be constructed from a ROMOL_SPTR. This avoids unnecessary copies when wrapping a ROMOL_SPTR (e.g., from subtructure library, JSMolList or R group decomposition) into a JSMol to pass it to JS. This also avoids that modifications done in the JS layer on a molecule stored in a MolList (e.g., adding a property) are not persisted because they are carried out on a volatile copy of the molecule rather than on the actual molecule.
Code/MinimalLib/tests/tests.js
- added a test for pesistence of modifications made to JSSharedMol
- added tests for RGD
- added test for JSMol::copy()
Code/RDGeneral/RDValue.h
- removed trailing comma from vector properties such that they can be deserialized as syntactically correct JSON
Code/RDGeneral/types.cpp, Code/RDGeneral/types.h
- added _rgroupTargetAtoms and _rgroupTargetBonds common_properties

* Code/GraphMol/Depictor/RDDepictor.h
- fixed typo in docstring
Code/GraphMol/RGroupDecomposition/RGroupCore.cpp
- added a missing const; formatting changes
Code/GraphMol/RGroupDecomposition/RGroupData.cpp, Code/GraphMol/RGroupDecomposition/RGroupData.h
- moved the code which merges disconnected R groups sharing the same attachment point into a single combined molecule to a private method, RGroupData::mergeIntoCombinedMol(). The method also includes logic to merge atom and bond highlights, if present.
- modernized a for loop
- isMolHydrogen is now a static function since it does not actually require any instance data
- implemented three static function to return the R group, Core and Mol labels, respectively
Code/GraphMol/RGroupDecomposition/RGroupDecomp.cpp, Code/GraphMol/RGroupDecomposition/RGroupDecomp.h
- implemented two private methods, RGroupDecomposition::labelAtomBondIndices() and RGroupDecomposition::setTargetAtomBondIndices(). The first method tags all atoms and bonds in the target molecule such that they can be tracked following core removal by RDKit::replaceCore(). The second method sets common_properties::_rgroupTargetAtoms and common_properties::_rgroupTargetBonds properties on core and R groups. These are vectors of atom and bond indices in the target molecule corresponding to core and R group atom/bonds, respectively, and can be used for color-coding the target molecule according to the R group decomposition it was subjected to, similarly to https://greglandrum.github.io/rdkit-blog/posts/2021-08-07-rgd-and-highlighting.html
Code/GraphMol/RGroupDecomposition/RGroupDecompData.cpp
- formatting changes and for loop modernization
Code/GraphMol/RGroupDecomposition/RGroupDecompParams.cpp, Code/GraphMol/RGroupDecomposition/RGroupDecompParams.h
- implemented updateRGroupDecompositionParametersFromJSON()
- added includeTargetMolInResults boolean parameter
Code/GraphMol/RGroupDecomposition/RGroupMatch.h
- implemented RGroupMatch::setTargetMoleculeForHighlights() and RGroupMatch::getTargetMoleculeForHighlights() methods to, respectively set and get the target molecule where R group decomposition can be color-coded with highlights. This molecule includes the explicit H atoms corresponding to extracted R groups, if any.
Code/GraphMol/RGroupDecomposition/Wrap/rdRGroupComposition.cpp
- use a std::unique_ptr to store the pointer to the C++ RGroupDecomposition instance
- fixed typos in docstrings
Code/GraphMol/RGroupDecomposition/Wrap/test_rgroups.py
- added test for the new includeTargetMolInResults parameter
Code/GraphMol/RGroupDecomposition/catch_rgd.cpp
- added test for the new includeTargetMolInResults parameter
Code/GraphMol/RGroupDecomposition/testRGroupDecomp.cpp
- formatting changes
Code/GraphMol/RGroupDecomposition/testRGroupInternals.cpp
- do not use deprecated constant
Code/MinimalLib/CMakeLists.txt
- added RDK_BUILD_MINIMAL_LIB_RGROUPDECOMP CMake flag to optionally expose R group decomposition functionality into MinimalLib
Code/MinimalLib/common.h
- added makeDummiesQueries flag to mol_from_input() (defaults to false)
- implemented parse_highlight_multi_colors() function to parse multi-color atom and bond highlights
- enable multi-color atom and bond highlighting
Code/MinimalLib/demo/rgd_demo.html
- added HTML page showcasing the multi-color highlights similarly to https://greglandrum.github.io/rdkit-blog/posts/2021-08-07-rgd-and-highlighting.html
Code/MinimalLib/jswrapper.cpp
- removed checks for non-nullness of d_mol as d_mol cannot be directly accessed anymore
- replaced all instances of d_mol with get()
- implemented support for multi-color atom and bond highlights
- implemented optional support for R group decomposition
- JSMol is now split into two subbclasses, JSMol and JSMolShared, which both inherit from the JSMolBase class. While JSMol can be constructed from a RWMol* as usual, JSMolShared can be constructed from a ROMOL_SPTR. This avoids unnecessary copies when wrapping a ROMOL_SPTR (e.g., from subtructure library, JSMolList or R group decomposition) into a JSMol to pass it to JS. This also avoids that modifications done in the JS layer on a molecule stored in a MolList (e.g., adding a property) are not persisted because they are carried out on a volatile copy of the molecule rather than on the actual molecule.
- added JSMolBase::copy() convenience method with same functionality as get_mol_copy() to duplicate a molecule
Code/MinimalLib/minilib.cpp, Code/MinimalLib/minilib.h
- replaced all occurrences of d_mol with get(), as d_mol is now private
- removed all occurrences of assert(d_mol) as non-nullness is checked at construction time and whenever get() is called
Code/MinimalLib/tests/tests.js
- added a test for pesistence of modifications made to JSMolShared
- added tests for RGD
- added test for JSMolBase::copy()
Code/RDGeneral/RDValue.h
- removed trailing comma from vector properties such that they can be deserialized as syntactically correct JSON
Code/RDGeneral/types.cpp, Code/RDGeneral/types.h
- added _rgroupTargetAtoms and _rgroupTargetBonds common_properties

* added assignChiralTypesFromMolParity flag

* added test for makeDummiesQueries

* added CFFI tests

* reordered tests

* re-added piece of code that had gone accidentally lost while merging conflicts

* Removed CHECK_INVARIANT following code review

---------

Co-authored-by: ptosco <paolo.tosco@novartis.com>
This commit is contained in:
Paolo Tosco
2024-09-16 16:14:13 +02:00
committed by GitHub
parent 5484738f3e
commit 786393beb1
18 changed files with 1219 additions and 81 deletions

View File

@@ -3189,6 +3189,193 @@ function test_multi_highlights() {
mol.delete();
}
const getFoundRgdRowAsMap = (row) => Object.fromEntries(Object.entries(row).map(([rlabel, mol]) => {
try {
assert(mol);
assert(mol instanceof RDKitModule.Mol);
const smi = mol.get_smiles();
return [rlabel, smi];
} finally {
if (mol) {
mol.delete();
}
}
}));
const getExpectedRgdRowAsMap = (row) => Object.fromEntries(row.split(' ').map((rgroup) => {
const match = rgroup.match(/^([^:]+):(.+)$/);
assert(match);
return match.slice(1);
}));
const getExpectedRgdAsCols = (rowArray) => {
const res = {};
rowArray.forEach((row, i) => {
const rgroupToSmiMap = getExpectedRgdRowAsMap(row);
Object.entries(rgroupToSmiMap).forEach(([rlabel, smi]) => {
if (!i) {
res[rlabel] = [];
}
const arr = res[rlabel];
assert(Array.isArray(arr));
arr.push(smi);
});
});
return res;
};
function test_singlecore_rgd() {
const ringData3 = ['c1cocc1CCl', 'c1c[nH]cc1CI', 'c1cscc1CF'];
const expectedRingData3Rgd = ['Core:c1cc([*:1])co1 R1:ClC[*:1]',
'Core:c1cc([*:1])c[nH]1 R1:IC[*:1]',
'Core:c1cc([*:1])cs1 R1:FC[*:1]'];
const expectedRingData3RgdAsCols = getExpectedRgdAsCols(expectedRingData3Rgd);
const core = RDKitModule.get_qmol('*1***[*:1]1');
assert(core);
try {
const scoreMethods = ['Match', 'FingerprintVariance'];
scoreMethods.forEach((scoreMethod) => {
const params = {
scoreMethod,
allowNonTerminalRGroups: true,
};
const rgd = RDKitModule.get_rgd(core, JSON.stringify(params));
ringData3.forEach((ringData, i) => {
const mol = RDKitModule.get_mol(ringData);
assert(mol);
try {
const res = rgd.add(mol);
assert(res === i);
} finally {
mol.delete();
}
});
rgd.process();
const rows = rgd.get_rgroups_as_rows();
assert(Array.isArray(rows));
assert(rows.length === ringData3.length);
rows.forEach((row, i) => {
const expectedRowMapping = getExpectedRgdRowAsMap(expectedRingData3Rgd[i]);
const foundMapping = getFoundRgdRowAsMap(row);
assert(Object.keys(foundMapping).length === Object.keys(expectedRowMapping).length);
Object.entries(foundMapping).forEach(([rlabel, smi]) => {
assert(expectedRowMapping[rlabel] && expectedRowMapping[rlabel] === smi);
});
})
const cols = rgd.get_rgroups_as_columns();
assert(typeof cols === 'object');
assert(Object.keys(cols).length === Object.keys(expectedRingData3RgdAsCols).length);
Object.keys(cols).forEach((rlabel) => {
const expectedRGroupsAsSmiles = expectedRingData3RgdAsCols[rlabel];
assert(Array.isArray(expectedRGroupsAsSmiles));
const rgroupsAsMolList = cols[rlabel];
assert(rgroupsAsMolList);
assert(rgroupsAsMolList instanceof RDKitModule.MolList);
try {
assert(expectedRGroupsAsSmiles.length === rgroupsAsMolList.size());
let i = 0;
while (!rgroupsAsMolList.at_end()) {
const mol = rgroupsAsMolList.next();
assert(mol);
try {
assert(mol.get_smiles() === expectedRGroupsAsSmiles[i++]);
} finally {
mol.delete();
}
}
} finally {
rgroupsAsMolList.delete();
}
});
});
} finally {
core.delete();
}
}
function test_multicore_rgd() {
const smiArray = [
'C1CCNC(Cl)CC1', 'C1CC(Cl)NCCC1', 'C1CCNC(I)CC1', 'C1CC(I)NCCC1',
'C1CCSC(Cl)CC1', 'C1CC(Cl)SCCC1', 'C1CCSC(I)CC1', 'C1CC(I)SCCC1',
'C1CCOC(Cl)CC1', 'C1CC(Cl)OCCC1', 'C1CCOC(I)CC1', 'C1CC(I)OCCC1'
];
const expectedRgd = [
'Core:C1CCNC([*:1])CC1 R1:Cl[*:1]', 'Core:C1CCNC([*:1])CC1 R1:Cl[*:1]',
'Core:C1CCNC([*:1])CC1 R1:I[*:1]', 'Core:C1CCNC([*:1])CC1 R1:I[*:1]',
'Core:C1CCSC([*:1])CC1 R1:Cl[*:1]', 'Core:C1CCSC([*:1])CC1 R1:Cl[*:1]',
'Core:C1CCSC([*:1])CC1 R1:I[*:1]', 'Core:C1CCSC([*:1])CC1 R1:I[*:1]',
'Core:C1CCOC([*:1])CC1 R1:Cl[*:1]', 'Core:C1CCOC([*:1])CC1 R1:Cl[*:1]',
'Core:C1CCOC([*:1])CC1 R1:I[*:1]', 'Core:C1CCOC([*:1])CC1 R1:I[*:1]'
];
const expectedRgdAsCols = getExpectedRgdAsCols(expectedRgd);
const cores = molListFromSmiArray(['C1CCNCCC1', 'C1CCOCCC1', 'C1CCSCCC1']);
try {
const rgd = RDKitModule.get_rgd(cores);
smiArray.forEach((smi, i) => {
const mol = RDKitModule.get_mol(smi);
assert(mol);
try {
assert(rgd.add(mol) === i);
} finally {
mol.delete();
}
});
assert(rgd.process());
const rows = rgd.get_rgroups_as_rows();
assert(Array.isArray(rows));
assert(rows.length === smiArray.length);
rows.forEach((row, i) => {
const expectedRowMapping = getExpectedRgdRowAsMap(expectedRgd[i]);
const foundMapping = getFoundRgdRowAsMap(row);
assert(Object.keys(foundMapping).length === Object.keys(expectedRowMapping).length);
Object.entries(foundMapping).forEach(([rlabel, smi]) => {
assert(expectedRowMapping[rlabel] && expectedRowMapping[rlabel] === smi);
});
})
const cols = rgd.get_rgroups_as_columns();
assert(typeof cols === 'object');
assert(Object.keys(cols).length === Object.keys(expectedRgdAsCols).length);
Object.keys(cols).forEach((rlabel) => {
const expectedRGroupsAsSmiles = expectedRgdAsCols[rlabel];
assert(Array.isArray(expectedRGroupsAsSmiles));
const rgroupsAsMolList = cols[rlabel];
assert(rgroupsAsMolList);
assert(rgroupsAsMolList instanceof RDKitModule.MolList);
try {
assert(expectedRGroupsAsSmiles.length === rgroupsAsMolList.size());
let i = 0;
while (!rgroupsAsMolList.at_end()) {
const mol = rgroupsAsMolList.next();
assert(mol);
try {
assert(mol.get_smiles() === expectedRGroupsAsSmiles[i++]);
} finally {
mol.delete();
}
}
} finally {
rgroupsAsMolList.delete();
}
});
} finally {
cores.delete();
}
}
function test_multi_highlights() {
const mol = RDKitModule.get_mol('[H]c1cc2c(-c3ccnc(Nc4ccc(F)c(F)c4)n3)c(-c3cccc(C(F)(F)F)c3)nn2nc1C', JSON.stringify({removeHs: false}));
const details = '{"width":250,"height":200,"highlightAtomMultipleColors":{"15":[[0.941,0.894,0.259]],"17":[[0,0.62,0.451]],"21":[[0.902,0.624,0]],"22":[[0.902,0.624,0]],"23":[[0.902,0.624,0]],"24":[[0.902,0.624,0]],"25":[[0.902,0.624,0]],"26":[[0.902,0.624,0]],"27":[[0.902,0.624,0]],"28":[[0.902,0.624,0]],"29":[[0.902,0.624,0]],"30":[[0.902,0.624,0]],"35":[[0.337,0.706,0.914]]},"highlightBondMultipleColors":{"14":[[0.941,0.894,0.259]],"16":[[0,0.62,0.451]],"20":[[0.902,0.624,0]],"21":[[0.902,0.624,0]],"22":[[0.902,0.624,0]],"23":[[0.902,0.624,0]],"24":[[0.902,0.624,0]],"25":[[0.902,0.624,0]],"26":[[0.902,0.624,0]],"27":[[0.902,0.624,0]],"28":[[0.902,0.624,0]],"29":[[0.902,0.624,0]],"34":[[0.337,0.706,0.914]],"38":[[0.902,0.624,0]]},"highlightAtomRadii":{"15":0.4,"17":0.4,"21":0.4,"22":0.4,"23":0.4,"24":0.4,"25":0.4,"26":0.4,"27":0.4,"28":0.4,"29":0.4,"30":0.4,"35":0.4},"highlightLineWidthMultipliers":{"14":2,"16":2,"20":2,"21":2,"22":2,"23":2,"24":2,"25":2,"26":2,"27":2,"28":2,"29":2,"34":2,"38":2}}';
const svgWithDetails = mol.get_svg_with_highlights(details);
assert(svgWithDetails.includes('ellipse'));
const COLORS = ['#009E73', '#55B4E9', '#E69F00', '#EFE342'];
assert(COLORS.every((color) => svgWithDetails.includes(color)));
const svgWithOutDetails = mol.get_svg_with_highlights('');
assert(!svgWithOutDetails.includes('ellipse'));
assert(!COLORS.some((color) => svgWithOutDetails.includes(color)));
mol.delete();
}
initRDKitModule().then(function(instance) {
var done = {};
const waitAllTestsFinished = () => {
@@ -3271,6 +3458,10 @@ initRDKitModule().then(function(instance) {
test_make_dummies_queries();
test_get_mol_copy();
test_multi_highlights();
if (RDKitModule.RGroupDecomposition) {
test_singlecore_rgd();
test_multicore_rgd();
}
waitAllTestsFinished().then(() =>
console.log("Tests finished successfully")
);