mirror of
https://github.com/rdkit/rdkit.git
synced 2026-06-04 21:54:27 +08:00
* short test file for MolVS standardize_sm * short test file for MolVS fragment * short test file for MolVS metals * short test file for MolVS normalize * short test file for MolVS reionize * short test file for MolVS tautomer * short test file for MolVS validate * long test file for MolVS standardize smiles * long test file for MolVS fragment * long test file for MolVS metals * long test file for MolVS normalize * long test file for MolVS reionize * long test file for MolVS tautomer * long test file for MolVS validate * Unit tests for MolVS steps * dropping support for Python2 * molvs/__init__.py * molvs/charge.py * molvs/errors.py * molvs/fragment.py * molvs/metal.py * molvs/normalize.py * molvs/resonance.py * molvs/standardize.py * molvs/tautomer.py * molvs/utils.py * molvs/validate.py * molvs/validations.py * molvs/cli.py * adapted and renamed molvs/cli.py to work within $RDBASE/Contrib/MolVS/ * setup MolStandardize directories, source with empty cleanup function, header, CMake files * corrections to empty source, header and test1.cpp * adding empty functions and initializers to MolStandardize * empty Metal source, header and added test * added most of Metal.cpp functionality and made some more tests * empty functions and initializers to Normalize * empty functions and initializers to Validate * added most code for RDKitDefault mode, along with some tests * restructure for abstract base class ValidateMethod * written in isNoneValidation for MolVSValidation * took out isNoneValidation, put in noAtomValidation, neutralValidation, isotopeValidation for MolVSValidation * added in AllowedAtoms * added in disallowedAtoms * corrections to Validate * added code for FragmentRemover * extended fragment functionality to include choose largest fragment, added in tests for fragment catalog, fragment remover. Also added fragmentValidation method in MolStandardize * added another test to testValidate test_fragment * corrections to fragment * corrections to Metal * added code for Normalize * added normalize member function to MolStandardize and added tests * added multi fragment functionality to Normalize.cpp and additional tests * TransformCatalog * tests for Normalize.cpp * first bit of cleanup * added most of Charge functionality and some tests * some corrections to Charge.cpp and some more tests to testCharge.cpp * corrections to Charge.cpp * start of Tautomer Enumerate with some tests * added BondType option to Tautomer Enumeration * correcting for some memory leakage * a few alterations to formatting * sorting out some memory leaks * sorting out some memory leaks * some corrections for PCS test set * redo tests with updated RDKit * fixing memory leak * more fixes after 100kPCS set testing * using tab as delimiter in CSVs rather than comma * tutorial for MolStandardize * still working on Tautomer enumeration * deleted some empty tests * starting writing tautomer canonicalize * rename test_data -> data (the source still needs to be updated) * automatic source reformatting * adjust to directory rename * move the fragment catalog test into the MolStandardize directory do not create separate library for FragmentCatalog * stop building separate libraries for the catalogs * move the CleanupParameters into the MolStandardize namespace * first pass at python wrapper * move the py module to the correct dir; add some python tests; add standardizeSmiles to python wrapper * disabling the compareMolVSTest since that requires command line arguments to run * get this building on windows * put the python lib in the right place * further work on python wrapper for rdMolStandardize * added get and set functions to Metal and wrapped them * added get and set functions to Metal and wrapped them * changed construstor of Reionizer class and input args for reionize, wrapped this default * overload Reionizer constructor so user can input own AcidBaseFile from python * added Uncharger class to Charge and added test for Uncharger * wrapped Fragment, fixed some memory leakage, changed some args and return types, added some tests * wrapped Normalized and changed how Normalizer class is initiated * changing MolVSValidation structure so user can choose which MolVS submethod they want * starting to write Wrap for Validate * now it compiles with Wrap/Validate.cpp * a couple refactorings around validate * move the validate code into the rdMolStandardize module * make sure a valid pointer is returned for standardizeSmiles * rdMolStandardize.MolVSValidation done and tests added * half way through AllowedAtomsValidation * finished AllowedAtomsValidation and DisallowedAtomsValidation * moved charge, fragment, metal, normalize into the rdMolStandardize module * changed tutorial to use wrapped code * added copyrights * added copyrights * move the data files * modify source files to adjust to the move * added validateSmiles functionality * removed std::cout * redid some of the 100k PCS tests * working on the tutorial * adding some documentation * deleting some comment lines * some changes after pull review * More changes after pull review * start of trying to make java wrap * remove some warnings, add some questions * additional warning removals, a bit more reporting * some test cleanups * enable testing of the java code
167 lines
5.0 KiB
C++
167 lines
5.0 KiB
C++
//
|
|
// Copyright (C) 2018 Susan H. Leung
|
|
//
|
|
// @@ All Rights Reserved @@
|
|
// This file is part of the RDKit.
|
|
// The contents are covered by the terms of the BSD license
|
|
// which is included in the file license.txt, found at the root
|
|
// of the RDKit source tree.
|
|
//
|
|
#include "Normalize.h"
|
|
#include <string>
|
|
#include <GraphMol/RDKitBase.h>
|
|
#include <GraphMol/ChemReactions/Reaction.h>
|
|
#include <GraphMol/ChemReactions/ReactionParser.h>
|
|
#include <GraphMol/SmilesParse/SmilesWrite.h>
|
|
#include <GraphMol/SanitException.h>
|
|
#include <GraphMol/ChemTransforms/ChemTransforms.h>
|
|
|
|
using namespace std;
|
|
using namespace RDKit;
|
|
|
|
namespace RDKit {
|
|
class RWMol;
|
|
class ROMol;
|
|
|
|
namespace MolStandardize {
|
|
|
|
// unsigned int MAX_RESTARTS = 200;
|
|
|
|
// constructor
|
|
Normalizer::Normalizer() {
|
|
BOOST_LOG(rdInfoLog) << "Initializing Normalizer\n";
|
|
TransformCatalogParams tparams(defaultCleanupParameters.normalizations);
|
|
// unsigned int ntransforms = tparams->getNumTransformations();
|
|
// TEST_ASSERT(ntransforms == 22);
|
|
this->d_tcat = new TransformCatalog(&tparams);
|
|
this->MAX_RESTARTS = 200;
|
|
}
|
|
|
|
// overloaded constructor
|
|
Normalizer::Normalizer(const std::string normalizeFile,
|
|
const unsigned int maxRestarts) {
|
|
BOOST_LOG(rdInfoLog) << "Initializing Normalizer\n";
|
|
TransformCatalogParams tparams(normalizeFile);
|
|
this->d_tcat = new TransformCatalog(&tparams);
|
|
this->MAX_RESTARTS = maxRestarts;
|
|
}
|
|
|
|
// destructor
|
|
Normalizer::~Normalizer() { delete d_tcat; }
|
|
|
|
ROMol *Normalizer::normalize(const ROMol &mol) {
|
|
BOOST_LOG(rdInfoLog) << "Running Normalizer\n";
|
|
PRECONDITION(this->d_tcat, "");
|
|
const TransformCatalogParams *tparams = this->d_tcat->getCatalogParams();
|
|
|
|
PRECONDITION(tparams, "");
|
|
const std::vector<std::shared_ptr<ChemicalReaction>> &transforms =
|
|
tparams->getTransformations();
|
|
|
|
std::vector<boost::shared_ptr<ROMol>> frags = MolOps::getMolFrags(mol);
|
|
std::vector<ROMOL_SPTR> nfrags; //( frags.size() );
|
|
for (const auto &frag : frags) {
|
|
ROMOL_SPTR nfrag(this->normalizeFragment(*frag, transforms));
|
|
nfrags.push_back(nfrag);
|
|
}
|
|
ROMol *outmol = new ROMol(*(nfrags.back()));
|
|
nfrags.pop_back();
|
|
for (const auto &nfrag : nfrags) {
|
|
ROMol *tmol = combineMols(*outmol, *nfrag);
|
|
delete outmol;
|
|
outmol = tmol;
|
|
// delete nfrag;
|
|
}
|
|
return outmol;
|
|
}
|
|
|
|
ROMol *Normalizer::normalizeFragment(
|
|
const ROMol &mol,
|
|
const std::vector<std::shared_ptr<ChemicalReaction>> &transforms) {
|
|
ROMol *nfrag = new ROMol(mol);
|
|
for (unsigned int i = 0; i < MAX_RESTARTS; ++i) {
|
|
bool loop_brake = false;
|
|
// Iterate through Normalization transforms and apply each in order
|
|
for (auto &transform : transforms) {
|
|
std::string tname;
|
|
transform->getProp(common_properties::_Name, tname);
|
|
boost::shared_ptr<ROMol> product =
|
|
this->applyTransform(*nfrag, *transform);
|
|
if (product != nullptr) {
|
|
BOOST_LOG(rdInfoLog) << "Rule applied: " << tname << "\n";
|
|
delete nfrag;
|
|
nfrag = new ROMol(*product);
|
|
loop_brake = true;
|
|
break;
|
|
}
|
|
}
|
|
// For loop finishes normally, all applicable transforms have been applied
|
|
if (!loop_brake) {
|
|
return nfrag;
|
|
}
|
|
}
|
|
BOOST_LOG(rdInfoLog) << "Gave up normalization after " << MAX_RESTARTS
|
|
<< " restarts.\n";
|
|
return nfrag;
|
|
}
|
|
|
|
boost::shared_ptr<ROMol> Normalizer::applyTransform(
|
|
const ROMol &mol, ChemicalReaction &transform) {
|
|
// Repeatedly apply normalization transform to molecule until no changes
|
|
// occur.
|
|
//
|
|
// It is possible for multiple products to be produced when a rule is applied.
|
|
// The rule is applied repeatedly to each of the products, until no further
|
|
// changes occur or after 20 attempts.
|
|
//
|
|
// If there are multiple unique products after the final application, the
|
|
// first product (sorted alphabetically by SMILES) is chosen.
|
|
|
|
boost::shared_ptr<ROMol> tmp(new ROMol(mol));
|
|
MOL_SPTR_VECT mols;
|
|
mols.push_back(tmp);
|
|
|
|
transform.initReactantMatchers();
|
|
// REVIEW: what's the source of the 20 in the next line?
|
|
for (unsigned int i = 0; i < 20; ++i) {
|
|
std::vector<Normalizer::Product> pdts;
|
|
for (auto &m : mols) {
|
|
std::vector<MOL_SPTR_VECT> products = transform.runReactants({m});
|
|
|
|
for (auto &pdt : products) {
|
|
// shared_ptr<ROMol> p0( new RWMol(*pdt[0]) );
|
|
// std::cout << MolToSmiles(*p0) <<
|
|
// std::endl;
|
|
unsigned int failed;
|
|
try {
|
|
MolOps::sanitizeMol(*static_cast<RWMol *>(pdt[0].get()), failed);
|
|
Normalizer::Product np(MolToSmiles(*pdt[0]), pdt[0]);
|
|
pdts.push_back(np);
|
|
} catch (MolSanitizeException) {
|
|
BOOST_LOG(rdInfoLog) << "FAILED sanitizeMol.\n";
|
|
}
|
|
}
|
|
}
|
|
if (pdts.size() != 0) {
|
|
std::sort(pdts.begin(), pdts.end());
|
|
mols.clear();
|
|
for (const auto &pdt : pdts) {
|
|
mols.push_back(pdt.Mol);
|
|
}
|
|
} else {
|
|
if (i > 0) {
|
|
return mols[0];
|
|
} else {
|
|
return nullptr;
|
|
}
|
|
}
|
|
}
|
|
if (mols.size())
|
|
return mols[0];
|
|
else
|
|
return nullptr;
|
|
}
|
|
|
|
} // namespace MolStandardize
|
|
} // namespace RDKit
|