Files
rdkit/Code/GraphMol/Descriptors/MORSE.cpp
Greg Landrum 71932ee4f9 Add a collection of new 3D descriptors (#1467)
* 3D Descriptors Dragons

* stripped down, not yet working

* get this building on a non C++-11 compiler

* move the python test to the python directory

* move the python test to the python directory

* add the python test

* now at least those tests runn

* warning comment

* some basic refactoring and cleanup

* get python wrapper "working" (completely untested)

* fix Name

* fixing AutoCorr & RDF

* AutoCorr test added
* RDF reviewed based on AutoCorr comments

* fix Morse code

* Morse reviewed

* Correct Morse & start Getaway

* correct MORSE test
* start Getaway clean up

* simplification of Whim

* better

* fix Getaway

* fix RCON

* merge repaired

* adding Dragon 2D autocorrelations descriptors

* fix the 3D autocorrelation descriptors based on the modification in
Dragon.
* Adding the 2D autocorrelation descriptors (no need of Eigen
dependency for this one)
* Adding 2D test case
* IState … no idea for the moment
* there is an error in 2D computation (memory error ???)

* fix the IState for molecules with Hs

* need to use getTotalNumHs(true) not getTotalNumHs()
* also need to remove Hs in both dv and d!

* fixing Rcov values

I fix the Rcov values

* fix Getaway

* remove push_back
* remove call to sum
* improve tests

* fix getaway

* adding precision parameter to GETAWAY
* adding rouding (1e-3)

* fix WHIM

* use void in declarations of function
* update MolDescriptors link

* remove print option in WHIM

* fix python wrapper to 3D descriptors.

- all modifications reduce computation time by a factor of 3!

* final fix for Getaway

* all output are fixed except the 2 first values due to clustering
approach.
* cluster cannot be fixed du tu float precision issue between Java &
c++

* best fix of ITH and ISH

* use the same algorithm as in Dragon 6 but there is still a deviation

* remove std::move

* std:move only works on c++ 11

* fixing issue based on Greg Comments

* auto2D still not working on my env

* update 3d test.py

* auto 2D not working after the first loop test

* tighten up the tests

* change name

* update, but still does not pass

* make this run (though it does not work)

* re-enable test3D

* some cleanup

* add GETAWAY test data. Note that the tests fail

* fix the ATS and ATSC autocorrelation 2D

Broto Moreau and Geary autocorrelation are not correct again a specificity of Dragon to compute them. The result are not consistant with Padel because we use the relative weigth not in Padel.

* one minor change to get things to compile

* fix the M & G matrix computation

fix inversion in the computation of the equations for both M & G matrixes

* update autocorr2d tests

* 192 examples

* fix issue in cluster 0.01 0.009 case

this is not correct all the cases

* update GETAWAY expected values to reflect the fact that we cannot reproduce the literature values exactly
fix a leak in GETAWAY

* fix the negative values in gamma

this is strickly the implementation that we find in the book molecular descriptors for chemoinformatics (except the case where an atom is already in the axis in this case it should be added in the symetric list which is not the case in this implementation)

* Update WHIM.cpp

adding the axis atoms to the symetrical list

* update WHIM tests

* add AUTOCORR2D to MolDescriptors and the python wrappers

* start adding tests

* test the python versions of the new descriptors

* update list of descriptors
2017-06-27 18:57:10 +02:00

182 lines
6.9 KiB
C++

//
// Copyright (c) 2016, Guillaume GODIN
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// * Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following
// disclaimer in the documentation and/or other materials provided
// with the distribution.
// * Neither the name of Institue of Cancer Research.
// nor the names of its contributors may be used to endorse or promote
// products derived from this software without specific prior written
// permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Created by Guillaume GODIN 2016
#include <GraphMol/RDKitBase.h>
#include "MORSE.h"
#include "MolData3Ddescriptors.h"
#include <math.h>
// data checked using book Todeschini R., Consonni V. - Molecular Descriptors
// for Chemoinformatics 2009 atomic properties page 21/22
namespace RDKit {
namespace Descriptors {
namespace {
MolData3Ddescriptors moldata3D;
std::vector<double> getG(int n) {
std::vector<double> res(n);
for (int i = 0; i < n; i++) {
res[i] = i;
}
return res;
}
std::vector<double> prepareIState(const ROMol &mol, const Conformer &conf) {
int numAtoms = conf.getNumAtoms();
int confId = conf.getId();
std::vector<double> IState = moldata3D.GetIState(mol);
return IState;
}
void getMORSEDesc(double *DM, const ROMol &mol, const Conformer &conf,
std::vector<double> &res) {
int numAtoms = conf.getNumAtoms();
int confId = conf.getId();
std::vector<double> R = getG(32);
std::vector<double> R1(32);
std::vector<double> R2(32);
std::vector<double> R3(32);
std::vector<double> R4(32);
std::vector<double> R5(32);
std::vector<double> R6(32);
std::vector<double> R7(32);
std::vector<double> Mass = moldata3D.GetRelativeMW(mol);
std::vector<double> RelativePol = moldata3D.GetRelativePol(mol);
std::vector<double> IonPol = moldata3D.GetRelativeIonPol(mol);
std::vector<double> RelativeElectroNeg = moldata3D.GetRelativeENeg(mol);
std::vector<double> RelativeVdW = moldata3D.GetRelativeVdW(mol);
std::vector<double> IState = prepareIState(mol, confId);
double p;
for (int i = 0; i < R.size(); i++) {
double res1 = 0.0;
double res2 = 0.0;
double res3 = 0.0;
double res4 = 0.0;
double res5 = 0.0;
double res6 = 0.0;
double res7 = 0.0;
for (int j = 0; j < numAtoms - 1; j++) {
for (int k = j + 1; k < numAtoms; k++) {
if (i == 0) {
p = 1;
} else {
p = sin(R[i] * DM[j * numAtoms + k]) / (R[i] * DM[j * numAtoms + k]);
}
res1 += p;
res2 += Mass[j] * Mass[k] * p;
res3 += RelativeVdW[j] * RelativeVdW[k] * p;
res4 += RelativeElectroNeg[j] * RelativeElectroNeg[k] * p;
res5 += RelativePol[j] * RelativePol[k] * p;
res6 += IonPol[j] * IonPol[k] * p;
res7 += IState[j] * IState[k] * p;
}
}
R1[i] = round(1000 * res1) / 1000;
R2[i] = round(1000 * res2) / 1000;
R3[i] = round(1000 * res3) / 1000;
R4[i] = round(1000 * res4) / 1000;
R5[i] = round(1000 * res5) / 1000;
R6[i] = round(1000 * res6) / 1000;
R7[i] = round(1000 * res7) / 1000;
}
R1.insert(R1.end(), R2.begin(), R2.end());
R1.insert(R1.end(), R3.begin(), R3.end());
R1.insert(R1.end(), R4.begin(), R4.end());
R1.insert(R1.end(), R5.begin(), R5.end());
R1.insert(R1.end(), R6.begin(), R6.end());
R1.insert(R1.end(), R7.begin(), R7.end());
res = R1;
}
void GetMORSE(double *dist3D, const ROMol &mol, const Conformer &conf,
std::vector<double> &res) {
getMORSEDesc(dist3D, mol, conf, res);
}
} // end of anonymous namespace
void MORSE(const ROMol &mol, std::vector<double> &res, int confId) {
PRECONDITION(mol.getNumConformers() >= 1, "molecule has no conformers")
// Mor01u Mor02u Mor03u Mor04u Mor05u Mor06u Mor07u Mor08u Mor09u
// Mor10u Mor11u Mor12u Mor13u Mor14u Mor15u Mor16u Mor17u Mor18u
// Mor19u Mor20u Mor21u Mor22u Mor23u Mor24u Mor25u Mor26u Mor27u
// Mor28u Mor29u Mor30u Mor31u Mor32u
// Mor01m Mor02m Mor03m Mor04m Mor05m Mor06m Mor07m Mor08m Mor09m
// Mor10m Mor11m Mor12m Mor13m Mor14m Mor15m Mor16m Mor17m Mor18m
// Mor19m Mor20m Mor21m Mor22m Mor23m Mor24m Mor25m Mor26m Mor27m
// Mor28m Mor29m Mor30m Mor31m Mor32m
// Mor01v Mor02v Mor03v Mor04v Mor05v Mor06v Mor07v Mor08v Mor09v
// Mor10v Mor11v Mor12v Mor13v Mor14v Mor15v Mor16v Mor17v Mor18v
// Mor19v Mor20v Mor21v Mor22v Mor23v Mor24v Mor25v Mor26v Mor27v
// Mor28v Mor29v Mor30v Mor31v Mor32v
// Mor01e Mor02e Mor03e Mor04e Mor05e Mor06e Mor07e Mor08e Mor09e
// Mor10e Mor11e Mor12e Mor13e Mor14e Mor15e Mor16e Mor17e Mor18e
// Mor19e Mor20e Mor21e Mor22e Mor23e Mor24e Mor25e Mor26e Mor27e
// Mor28e Mor29e Mor30e Mor31e Mor32e
// Mor01p Mor02p Mor03p Mor04p Mor05p Mor06p Mor07p Mor08p Mor09p
// Mor10p Mor11p Mor12p Mor13p Mor14p Mor15p Mor16p Mor17p Mor18p
// Mor19p Mor20p Mor21p Mor22p Mor23p Mor24p Mor25p Mor26p Mor27p
// Mor28p Mor29p Mor30p Mor31p Mor32p
// Mor01i Mor02i Mor03i Mor04i Mor05i Mor06i Mor07i Mor08i Mor09i
// Mor10i Mor11i Mor12i Mor13i Mor14i Mor15i Mor16i Mor17i Mor18i
// Mor19i Mor20i Mor21i Mor22i Mor23i Mor24i Mor25i Mor26i Mor27i
// Mor28i Mor29i Mor30i Mor31i Mor32i
// Mor01s Mor02s Mor03s Mor04s Mor05s Mor06s Mor07s Mor08s Mor09s
// Mor10s Mor11s Mor12s Mor13s Mor14s Mor15s Mor16s Mor17s Mor18s
// Mor19s Mor20s Mor21s Mor22s Mor23s Mor24s Mor25s Mor26s Mor27s
// Mor28s Mor29s Mor30s Mor31s Mor32s
const Conformer &conf = mol.getConformer(confId);
double *dist3D =
MolOps::get3DDistanceMat(mol, confId, false, true); // 3D distance matrix
res.clear();
res.resize(224); // 7 * 32
GetMORSE(dist3D, mol, conf, res);
}
} // end of Descriptors namespace
} // end of RDKit namespace