Files
rdkit/Code/GraphMol/GaussianShape/SingleConformerAlignment.h
David Cosgrove 9f551aedbe Multi conf gaussian shape (#9265)
* First import of GaussianShape.

* Tidying.

* Custom features.

* Optimise.

* Optimise.

* Return 3 scores rather than 2 including combo score.

* Rename useFeatures to useColors.

* Python wrappers.

* Python tests.

* Take out big test.

* Add new start mode, as PubChem does it.

* Doh!

* Fix MolTransforms eigenvalue return.

* Two cycle optimisation, mostly working.

* Take out bestSoFar score from SCA.

* Take out DTYPE.

* Tidy out redundant variables.

* Optimisation in 2 parts.

* More fiddling in pursuit of speed.

* Update Python wrapper.

* Tweak.

* Atom subsets and different radii.

* Fix test.

* Revert pubchem_shape's test.cpp.

* Serialize ShapeInput.

* Trigger build

* Remove pointers to std::arrays in ShapeInput.

* ShapeInput virtual d'tor.

* Precondition - ShapeInput needs a molecule with at least 1 conformer.

* Rename ShapeInput::d_centroid to ShapeInput::d_canonTrans.

* Fix normalization bugs.

* Select start mode using moments of inertia rather than eigenvalues of canonical transformation.

* Include color features in moments of inertia.

* Smidge faster.

* Tversky similarity.

* Tidy tests.

* Tests working on Linux.

* Revert force of right handed axes in MolTransforms::computePrincipalAxesAndMomentsFromGyrationMatrix replacing with a comment in the code.

* Response to review.

* Sneaky allCarbon bug.

* add multithreaded test

* Response to review.

* Doh! Don't recalculate normalization after every transformation.

* Re-instate d_normalizationOK.

* Re-name functions for fetching canonical transformations.

* Separate alpha from coords.

* MultiConf works with single conf extraction.

* Extract all conformations.
Max and best similarities.

* Renames d_currConformer to d_activeShape.

* Update shapeToMol.

* Update shapeToMol.

* Changes from synthon shape searching.

* Fix normalization of multiple confs.

* Update Python wrappers.

* Fix shape merge.

* Improve bestSimilarity.

* Fix python wrapper.

* Pull in changes from SynthonShapeSearch:
make pruneShapes public.
function to negate Alpha values.

* clang-tidy suggestions.

* clang-tidy suggestions.

* Bug in quaternion gradients - we now have only 3 coordinates.

* Tidy tests.

* Mac result slightly different.

* Multi conformer molecule alignment.

* Optionally return raw overlap volumes in score functions.

* Python wrappers for raw overlap volumes.

* Update Python wrapper ShapeInputOptions.

* Tidy for PR.

* Extra include file.

* Extra library

* Tidy forward declarations.

* Don't prune if threshold < 0.0.

* Windows exporty thing.

* Check SMILES on merge of ShapeInputs.

* PRECONDITION of SMILES on merge of ShapeInputs.

* Response to review - rename some functions.

* change how overlapVols is passed
add a test for it

* API suggestions

* Response to review.

* Remove debugging writes.

* Fix Python wrappers.

---------

Co-authored-by: David Cosgrove <david@cozchemix.co.uk>
Co-authored-by: greg landrum <greg.landrum@gmail.com>
2026-06-03 06:09:09 +02:00

211 lines
9.7 KiB
C++

//
// Copyright (C) 2026 David Cosgrove and other RDKit contributors
//
// @@ All Rights Reserved @@
// This file is part of the RDKit.
// The contents are covered by the terms of the BSD license
// which is included in the file license.txt, found at the root
// of the RDKit source tree.
//
// Original author: David Cosgrove (CozChemIx Limited)
//
// This is the class that does optimises a moving molecule (fit)
// to maximise its Gaussian overlap with the reference molecule (ref).
// The optimiser is a modified BFGS taken in large part, but re-arranged
// for readability, from the PubChem shape overlay code
// https://github.com/ncbi/pubchem-align3d/blob/main/shape_neighbor.cpp
#ifndef RDKIT_SINGLECONFORMERALIGNMENT_GUARD
#define RDKIT_SINGLECONFORMERALIGNMENT_GUARD
#include <array>
#include <RDGeneral/BoostStartInclude.h>
#include <boost/dynamic_bitset.hpp>
#include <RDGeneral/BoostEndInclude.h>
#include <RDGeneral/export.h>
#include <GraphMol/MolTransforms/MolTransforms.h>
#include <GraphMol/GaussianShape/ShapeOverlayOptions.h>
namespace RDKit {
namespace GaussianShape {
struct RDKIT_GAUSSIANSHAPE_EXPORT SingleConformerAlignment {
SingleConformerAlignment() = delete;
/// @brief Do the overlay for a single conformer of fit against a single
/// conformer of ref. The output in scores is the rotation and translation
/// that moves fit to optimise its score with ref.
/// @param ref - the reference molecule as 1D array of 3 * N entries. Each
/// block of 3 is the coords
/// @param refAlphas - the alpha values for the reference
/// @param refTypes - the feature types for molecule ref
/// @param refCarbonRadii - whether each atom has a carbon radius
/// @param nRefShape - the number of atoms in ref
/// @param nRefColor - the number of features in ref
/// @param refShapeVol - overlap volume of ref with itself
/// @param refColorVol - color overlap of ref with itself
/// @param fit - the fit molecule as 1D array of 3 * N entries. Each
/// block of 3 is the coords.
/// @param fitAlphas - the alpha values for the fit
/// @param fitTypes - the feature types for fit molecule
/// @param fitCarbonRadii - whether each atom has a carbon radius
/// @param nFitShape - the number of atoms in fit
/// @param nFitColor - the number of features in fit
/// @param fitShapeVol - overlap volume of fit with itself
/// @param fitColorVol - color overlap of fit with itself
/// @param optimMode - optimisation mode
/// @param simAlpha - for the Tversky similarity, the alpha value
/// @param simBeta - for the Tversky similarity, the beta value
/// @param mixingParam - how to mix the 2 Tversky values
/// @param useCutoff - whether to use a distance cutoff in the volume
/// calculation
/// @param distCutoff - the cutoff to use if we're doing it.
/// @param maxIts - maximum number of iterations for optimiser
/// of optimiser
SingleConformerAlignment(
const std::vector<double> &ref, const std::vector<double> &refAlphas,
const int *refTypes, const boost::dynamic_bitset<> *refCarbonRadii,
int nRefShape, int nRefColor, double refShapeVol, double refColorVol,
const std::vector<double> &fit, const std::vector<double> &fitAlphas,
const int *fitTypes, const boost::dynamic_bitset<> *fitCarbonRadii,
int nFitShape, int nFitColor, double fitShapeVol, double fitColorVol,
const std::array<double, 7> &initQuatTrans, OptimMode optimMode,
double simAlpha, double simBeta, double mixingParam, bool useCutoff,
double distCutoff, double shapeConvergenceCriterion, unsigned int maxIts);
SingleConformerAlignment(const SingleConformerAlignment &other) = delete;
SingleConformerAlignment(SingleConformerAlignment &&other) = delete;
SingleConformerAlignment &operator=(const SingleConformerAlignment &other) =
delete;
SingleConformerAlignment &operator=(SingleConformerAlignment &&other) =
delete;
~SingleConformerAlignment() = default;
void setQuatTrans(const std::array<double, 7> &quatTrans) {
d_quatTrans = quatTrans;
}
// Get the final transformation by adding the initial transformation
// and the optimised final answer.
void getFinalQuatTrans(RDGeom::Transform3D &xform) const;
// Calculate the combined, shape, and color Tversky scores as appropriate,
// plus the volume of the shape and color overlaps, in that order.
// Assumes that ref and fit are already in the correct configurations.
// If includeColor is passed in true, it will compute the color score
// irrespective of the value in d_optimMode. We still want the color
// score even if doing SHAPE_ONLY optimisation, for example.
std::array<double, 5> calcScores(const double *ref, const double *fit,
bool includeColor = false);
// This one applies the current quatTrans to the coords and then calculates
// the score.
std::array<double, 5> calcScores(bool includeColor = false);
// This one computes the scores from the given overlap volumes. Color score
// only calculated if the color volumes are non-zero.
std::array<double, 5> calcScores(const double shapeOvVol,
const double colorOvVol,
bool includeColor = true) const;
// Apply the quatTrans to the ref and fit shapes and put the results
// into their tmp equivalents. Ref is translated by the -ve of the
// translation, fit is rotated by the rotation bit.
void applyQuatTrans(const std::array<double, 7> &quatTrans);
// Calculate the overlap volume between A and B after the given "quaternion"
// has been applied. The "quaternion" is 7 elements, the first 4 the
// quaternion the last 3 the translation that currently form the
// transformation that overlays B onto A.
void calcVolumeAndGradients(const std::array<double, 7> &quatTrans,
double &shapeOvlpVol, double &colorOvlpVol,
std::array<double, 7> &gradients);
/// @brief Do the overlay, feeding the results into scores.
/// @return scores - the output scores and transformation to reproduce the
/// overlay - an array of size 20. Only the first 16 are used here. They are:
/// 0 - the combo score
/// 1 - the shape Tversky score
/// 2 - the color Tversky score
/// 3 - the shape overlap volume
/// 4 - the color overlap volume
/// 5 - the shape volume of fit
/// 6 - the shape volume of ref
/// 7 - the color volume of fit
/// 8 - the color volume of ref
/// 9-12 - the quaternion to rotate fit onto ref. Applied first.
/// 13-15 - the translation to move fit onto ref. Applied second.
/// 16-19 - not used at present, returned as zeros.
/// Returns false if it didn't finish with the allowed maximum number of
/// iterations.
bool doOverlay(std::array<double, 20> &scores, unsigned int cycle);
// Find the quaternion and translation that maximises the volume
// overlap appropriate to d_optimMode. Returns false if it didn't finish with
// the allowed maximum number of iterations.
bool optimise(unsigned int maxIters);
std::vector<double> d_ref;
std::vector<double> d_refAlphas;
std::vector<double> d_refTemp;
const int *d_refTypes;
const boost::dynamic_bitset<> *d_refCarbonRadii;
const int d_nRefShape;
const int d_nRefColor;
const double d_refShapeVol;
const double d_refColorVol;
std::vector<double> d_fit;
std::vector<double> d_fitAlphas;
std::vector<double> d_fitTemp;
const int *d_fitTypes;
const boost::dynamic_bitset<> *d_fitCarbonRadii;
const int d_nFitShape;
const int d_nFitColor;
double d_fitShapeVol;
double d_fitColorVol;
std::array<double, 7> d_initQuatTrans;
const OptimMode d_optimMode;
const double d_simAlpha;
const double d_simBeta;
const double d_mixingParam;
const bool d_useCutoff;
const double d_distCutoff2;
const double d_shapeConvergenceCriterion;
const unsigned int d_maxIts;
// The quaternion/translation as the optimisation proceeds
std::array<double, 7> d_quatTrans{1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0};
// The step sizes of the quaternion and translation during the
// optimisation. Taken from the PubChem code.
double d_qStepSize{-0.001};
double d_tStepSize{-0.01};
// Scratch space for the gradients dr/dQ of the fit molecule.
std::vector<double> d_gradConverters;
};
// Compute the volume overlap and optionally "quaternion" gradients for the
// overlap volume of ref and fit, wrt fit. fit is the original coords of
// the fit molecule, fitTemp is those subject to any transformation applied
// by the quaternion we're using to optimise the overlap volume. If
// gradients is null, they won't be calculated. They are assumed to be
// initialised correctly.
// This is for the atoms/shape features. Negative alphas are skipped.
RDKIT_GAUSSIANSHAPE_EXPORT double calcVolAndGrads(
const double *ref, const double *refAlphas, int numRefPts,
const boost::dynamic_bitset<> *refCarbonRadii, const double *fit,
const double *fitAlphas, int numFitPts,
const boost::dynamic_bitset<> *fitCarbonRadii,
std::vector<double> &gradConverters, const bool useCutoff,
const double distCutoff2, const double *quat = nullptr,
double *gradients = nullptr);
// This one is for the features, and only calculates values if the types
// of 2 features match. Negative alphas are skipped.
RDKIT_GAUSSIANSHAPE_EXPORT double calcVolAndGrads(
const double *ref, const double *refAlphas, int numRefPts,
const int *refTypes, const double *fit, const double *fitAlphas,
int numFitPts, const int *fitTypes, int numFitShape,
std::vector<double> &gradConverters, const bool useCutoff,
const double distCutoff2, const double *quat, double *gradients);
} // namespace GaussianShape
} // namespace RDKit
#endif // RDKIT_SINGLECONFORMERALIGNMENT_GUARD