mirror of
https://github.com/rdkit/rdkit.git
synced 2026-06-04 21:54:27 +08:00
* First import of GaussianShape. * Tidying. * Custom features. * Optimise. * Optimise. * Return 3 scores rather than 2 including combo score. * Rename useFeatures to useColors. * Python wrappers. * Python tests. * Take out big test. * Add new start mode, as PubChem does it. * Doh! * Fix MolTransforms eigenvalue return. * Two cycle optimisation, mostly working. * Take out bestSoFar score from SCA. * Take out DTYPE. * Tidy out redundant variables. * Optimisation in 2 parts. * More fiddling in pursuit of speed. * Update Python wrapper. * Tweak. * Atom subsets and different radii. * Fix test. * Revert pubchem_shape's test.cpp. * Serialize ShapeInput. * Trigger build * Remove pointers to std::arrays in ShapeInput. * ShapeInput virtual d'tor. * Precondition - ShapeInput needs a molecule with at least 1 conformer. * Rename ShapeInput::d_centroid to ShapeInput::d_canonTrans. * Fix normalization bugs. * Select start mode using moments of inertia rather than eigenvalues of canonical transformation. * Include color features in moments of inertia. * Smidge faster. * Tversky similarity. * Tidy tests. * Tests working on Linux. * Revert force of right handed axes in MolTransforms::computePrincipalAxesAndMomentsFromGyrationMatrix replacing with a comment in the code. * Response to review. * Sneaky allCarbon bug. * add multithreaded test * Response to review. * Doh! Don't recalculate normalization after every transformation. * Re-instate d_normalizationOK. * Re-name functions for fetching canonical transformations. * Separate alpha from coords. * MultiConf works with single conf extraction. * Extract all conformations. Max and best similarities. * Renames d_currConformer to d_activeShape. * Update shapeToMol. * Update shapeToMol. * Changes from synthon shape searching. * Fix normalization of multiple confs. * Update Python wrappers. * Fix shape merge. * Improve bestSimilarity. * Fix python wrapper. * Pull in changes from SynthonShapeSearch: make pruneShapes public. function to negate Alpha values. * clang-tidy suggestions. * clang-tidy suggestions. * Bug in quaternion gradients - we now have only 3 coordinates. * Tidy tests. * Mac result slightly different. * Multi conformer molecule alignment. * Optionally return raw overlap volumes in score functions. * Python wrappers for raw overlap volumes. * Update Python wrapper ShapeInputOptions. * Tidy for PR. * Extra include file. * Extra library * Tidy forward declarations. * Don't prune if threshold < 0.0. * Windows exporty thing. * Check SMILES on merge of ShapeInputs. * PRECONDITION of SMILES on merge of ShapeInputs. * Response to review - rename some functions. * change how overlapVols is passed add a test for it * API suggestions * Response to review. * Remove debugging writes. * Fix Python wrappers. --------- Co-authored-by: David Cosgrove <david@cozchemix.co.uk> Co-authored-by: greg landrum <greg.landrum@gmail.com>
211 lines
9.7 KiB
C++
211 lines
9.7 KiB
C++
//
|
|
// Copyright (C) 2026 David Cosgrove and other RDKit contributors
|
|
//
|
|
// @@ All Rights Reserved @@
|
|
// This file is part of the RDKit.
|
|
// The contents are covered by the terms of the BSD license
|
|
// which is included in the file license.txt, found at the root
|
|
// of the RDKit source tree.
|
|
//
|
|
// Original author: David Cosgrove (CozChemIx Limited)
|
|
//
|
|
// This is the class that does optimises a moving molecule (fit)
|
|
// to maximise its Gaussian overlap with the reference molecule (ref).
|
|
// The optimiser is a modified BFGS taken in large part, but re-arranged
|
|
// for readability, from the PubChem shape overlay code
|
|
// https://github.com/ncbi/pubchem-align3d/blob/main/shape_neighbor.cpp
|
|
|
|
#ifndef RDKIT_SINGLECONFORMERALIGNMENT_GUARD
|
|
#define RDKIT_SINGLECONFORMERALIGNMENT_GUARD
|
|
|
|
#include <array>
|
|
|
|
#include <RDGeneral/BoostStartInclude.h>
|
|
#include <boost/dynamic_bitset.hpp>
|
|
#include <RDGeneral/BoostEndInclude.h>
|
|
|
|
#include <RDGeneral/export.h>
|
|
#include <GraphMol/MolTransforms/MolTransforms.h>
|
|
#include <GraphMol/GaussianShape/ShapeOverlayOptions.h>
|
|
|
|
namespace RDKit {
|
|
namespace GaussianShape {
|
|
struct RDKIT_GAUSSIANSHAPE_EXPORT SingleConformerAlignment {
|
|
SingleConformerAlignment() = delete;
|
|
/// @brief Do the overlay for a single conformer of fit against a single
|
|
/// conformer of ref. The output in scores is the rotation and translation
|
|
/// that moves fit to optimise its score with ref.
|
|
/// @param ref - the reference molecule as 1D array of 3 * N entries. Each
|
|
/// block of 3 is the coords
|
|
/// @param refAlphas - the alpha values for the reference
|
|
/// @param refTypes - the feature types for molecule ref
|
|
/// @param refCarbonRadii - whether each atom has a carbon radius
|
|
/// @param nRefShape - the number of atoms in ref
|
|
/// @param nRefColor - the number of features in ref
|
|
/// @param refShapeVol - overlap volume of ref with itself
|
|
/// @param refColorVol - color overlap of ref with itself
|
|
/// @param fit - the fit molecule as 1D array of 3 * N entries. Each
|
|
/// block of 3 is the coords.
|
|
/// @param fitAlphas - the alpha values for the fit
|
|
/// @param fitTypes - the feature types for fit molecule
|
|
/// @param fitCarbonRadii - whether each atom has a carbon radius
|
|
/// @param nFitShape - the number of atoms in fit
|
|
/// @param nFitColor - the number of features in fit
|
|
/// @param fitShapeVol - overlap volume of fit with itself
|
|
/// @param fitColorVol - color overlap of fit with itself
|
|
/// @param optimMode - optimisation mode
|
|
/// @param simAlpha - for the Tversky similarity, the alpha value
|
|
/// @param simBeta - for the Tversky similarity, the beta value
|
|
/// @param mixingParam - how to mix the 2 Tversky values
|
|
/// @param useCutoff - whether to use a distance cutoff in the volume
|
|
/// calculation
|
|
/// @param distCutoff - the cutoff to use if we're doing it.
|
|
/// @param maxIts - maximum number of iterations for optimiser
|
|
/// of optimiser
|
|
SingleConformerAlignment(
|
|
const std::vector<double> &ref, const std::vector<double> &refAlphas,
|
|
const int *refTypes, const boost::dynamic_bitset<> *refCarbonRadii,
|
|
int nRefShape, int nRefColor, double refShapeVol, double refColorVol,
|
|
const std::vector<double> &fit, const std::vector<double> &fitAlphas,
|
|
const int *fitTypes, const boost::dynamic_bitset<> *fitCarbonRadii,
|
|
int nFitShape, int nFitColor, double fitShapeVol, double fitColorVol,
|
|
const std::array<double, 7> &initQuatTrans, OptimMode optimMode,
|
|
double simAlpha, double simBeta, double mixingParam, bool useCutoff,
|
|
double distCutoff, double shapeConvergenceCriterion, unsigned int maxIts);
|
|
|
|
SingleConformerAlignment(const SingleConformerAlignment &other) = delete;
|
|
SingleConformerAlignment(SingleConformerAlignment &&other) = delete;
|
|
SingleConformerAlignment &operator=(const SingleConformerAlignment &other) =
|
|
delete;
|
|
SingleConformerAlignment &operator=(SingleConformerAlignment &&other) =
|
|
delete;
|
|
~SingleConformerAlignment() = default;
|
|
|
|
void setQuatTrans(const std::array<double, 7> &quatTrans) {
|
|
d_quatTrans = quatTrans;
|
|
}
|
|
|
|
// Get the final transformation by adding the initial transformation
|
|
// and the optimised final answer.
|
|
void getFinalQuatTrans(RDGeom::Transform3D &xform) const;
|
|
|
|
// Calculate the combined, shape, and color Tversky scores as appropriate,
|
|
// plus the volume of the shape and color overlaps, in that order.
|
|
// Assumes that ref and fit are already in the correct configurations.
|
|
// If includeColor is passed in true, it will compute the color score
|
|
// irrespective of the value in d_optimMode. We still want the color
|
|
// score even if doing SHAPE_ONLY optimisation, for example.
|
|
std::array<double, 5> calcScores(const double *ref, const double *fit,
|
|
bool includeColor = false);
|
|
// This one applies the current quatTrans to the coords and then calculates
|
|
// the score.
|
|
std::array<double, 5> calcScores(bool includeColor = false);
|
|
// This one computes the scores from the given overlap volumes. Color score
|
|
// only calculated if the color volumes are non-zero.
|
|
std::array<double, 5> calcScores(const double shapeOvVol,
|
|
const double colorOvVol,
|
|
bool includeColor = true) const;
|
|
|
|
// Apply the quatTrans to the ref and fit shapes and put the results
|
|
// into their tmp equivalents. Ref is translated by the -ve of the
|
|
// translation, fit is rotated by the rotation bit.
|
|
void applyQuatTrans(const std::array<double, 7> &quatTrans);
|
|
|
|
// Calculate the overlap volume between A and B after the given "quaternion"
|
|
// has been applied. The "quaternion" is 7 elements, the first 4 the
|
|
// quaternion the last 3 the translation that currently form the
|
|
// transformation that overlays B onto A.
|
|
void calcVolumeAndGradients(const std::array<double, 7> &quatTrans,
|
|
double &shapeOvlpVol, double &colorOvlpVol,
|
|
std::array<double, 7> &gradients);
|
|
|
|
/// @brief Do the overlay, feeding the results into scores.
|
|
/// @return scores - the output scores and transformation to reproduce the
|
|
/// overlay - an array of size 20. Only the first 16 are used here. They are:
|
|
/// 0 - the combo score
|
|
/// 1 - the shape Tversky score
|
|
/// 2 - the color Tversky score
|
|
/// 3 - the shape overlap volume
|
|
/// 4 - the color overlap volume
|
|
/// 5 - the shape volume of fit
|
|
/// 6 - the shape volume of ref
|
|
/// 7 - the color volume of fit
|
|
/// 8 - the color volume of ref
|
|
/// 9-12 - the quaternion to rotate fit onto ref. Applied first.
|
|
/// 13-15 - the translation to move fit onto ref. Applied second.
|
|
/// 16-19 - not used at present, returned as zeros.
|
|
/// Returns false if it didn't finish with the allowed maximum number of
|
|
/// iterations.
|
|
bool doOverlay(std::array<double, 20> &scores, unsigned int cycle);
|
|
|
|
// Find the quaternion and translation that maximises the volume
|
|
// overlap appropriate to d_optimMode. Returns false if it didn't finish with
|
|
// the allowed maximum number of iterations.
|
|
bool optimise(unsigned int maxIters);
|
|
|
|
std::vector<double> d_ref;
|
|
std::vector<double> d_refAlphas;
|
|
std::vector<double> d_refTemp;
|
|
const int *d_refTypes;
|
|
const boost::dynamic_bitset<> *d_refCarbonRadii;
|
|
const int d_nRefShape;
|
|
const int d_nRefColor;
|
|
const double d_refShapeVol;
|
|
const double d_refColorVol;
|
|
std::vector<double> d_fit;
|
|
std::vector<double> d_fitAlphas;
|
|
std::vector<double> d_fitTemp;
|
|
const int *d_fitTypes;
|
|
const boost::dynamic_bitset<> *d_fitCarbonRadii;
|
|
const int d_nFitShape;
|
|
const int d_nFitColor;
|
|
double d_fitShapeVol;
|
|
double d_fitColorVol;
|
|
std::array<double, 7> d_initQuatTrans;
|
|
const OptimMode d_optimMode;
|
|
const double d_simAlpha;
|
|
const double d_simBeta;
|
|
const double d_mixingParam;
|
|
const bool d_useCutoff;
|
|
const double d_distCutoff2;
|
|
const double d_shapeConvergenceCriterion;
|
|
const unsigned int d_maxIts;
|
|
// The quaternion/translation as the optimisation proceeds
|
|
std::array<double, 7> d_quatTrans{1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0};
|
|
// The step sizes of the quaternion and translation during the
|
|
// optimisation. Taken from the PubChem code.
|
|
double d_qStepSize{-0.001};
|
|
double d_tStepSize{-0.01};
|
|
// Scratch space for the gradients dr/dQ of the fit molecule.
|
|
std::vector<double> d_gradConverters;
|
|
};
|
|
|
|
// Compute the volume overlap and optionally "quaternion" gradients for the
|
|
// overlap volume of ref and fit, wrt fit. fit is the original coords of
|
|
// the fit molecule, fitTemp is those subject to any transformation applied
|
|
// by the quaternion we're using to optimise the overlap volume. If
|
|
// gradients is null, they won't be calculated. They are assumed to be
|
|
// initialised correctly.
|
|
// This is for the atoms/shape features. Negative alphas are skipped.
|
|
RDKIT_GAUSSIANSHAPE_EXPORT double calcVolAndGrads(
|
|
const double *ref, const double *refAlphas, int numRefPts,
|
|
const boost::dynamic_bitset<> *refCarbonRadii, const double *fit,
|
|
const double *fitAlphas, int numFitPts,
|
|
const boost::dynamic_bitset<> *fitCarbonRadii,
|
|
std::vector<double> &gradConverters, const bool useCutoff,
|
|
const double distCutoff2, const double *quat = nullptr,
|
|
double *gradients = nullptr);
|
|
// This one is for the features, and only calculates values if the types
|
|
// of 2 features match. Negative alphas are skipped.
|
|
RDKIT_GAUSSIANSHAPE_EXPORT double calcVolAndGrads(
|
|
const double *ref, const double *refAlphas, int numRefPts,
|
|
const int *refTypes, const double *fit, const double *fitAlphas,
|
|
int numFitPts, const int *fitTypes, int numFitShape,
|
|
std::vector<double> &gradConverters, const bool useCutoff,
|
|
const double distCutoff2, const double *quat, double *gradients);
|
|
|
|
} // namespace GaussianShape
|
|
} // namespace RDKit
|
|
|
|
#endif // RDKIT_SINGLECONFORMERALIGNMENT_GUARD
|