mirror of
https://github.com/rdkit/rdkit.git
synced 2026-06-05 22:04:27 +08:00
* short test file for MolVS standardize_sm * short test file for MolVS fragment * short test file for MolVS metals * short test file for MolVS normalize * short test file for MolVS reionize * short test file for MolVS tautomer * short test file for MolVS validate * long test file for MolVS standardize smiles * long test file for MolVS fragment * long test file for MolVS metals * long test file for MolVS normalize * long test file for MolVS reionize * long test file for MolVS tautomer * long test file for MolVS validate * Unit tests for MolVS steps * dropping support for Python2 * molvs/__init__.py * molvs/charge.py * molvs/errors.py * molvs/fragment.py * molvs/metal.py * molvs/normalize.py * molvs/resonance.py * molvs/standardize.py * molvs/tautomer.py * molvs/utils.py * molvs/validate.py * molvs/validations.py * molvs/cli.py * adapted and renamed molvs/cli.py to work within $RDBASE/Contrib/MolVS/ * setup MolStandardize directories, source with empty cleanup function, header, CMake files * corrections to empty source, header and test1.cpp * adding empty functions and initializers to MolStandardize * empty Metal source, header and added test * added most of Metal.cpp functionality and made some more tests * empty functions and initializers to Normalize * empty functions and initializers to Validate * added most code for RDKitDefault mode, along with some tests * restructure for abstract base class ValidateMethod * written in isNoneValidation for MolVSValidation * took out isNoneValidation, put in noAtomValidation, neutralValidation, isotopeValidation for MolVSValidation * added in AllowedAtoms * added in disallowedAtoms * corrections to Validate * added code for FragmentRemover * extended fragment functionality to include choose largest fragment, added in tests for fragment catalog, fragment remover. Also added fragmentValidation method in MolStandardize * added another test to testValidate test_fragment * corrections to fragment * corrections to Metal * added code for Normalize * added normalize member function to MolStandardize and added tests * added multi fragment functionality to Normalize.cpp and additional tests * TransformCatalog * tests for Normalize.cpp * first bit of cleanup * added most of Charge functionality and some tests * some corrections to Charge.cpp and some more tests to testCharge.cpp * corrections to Charge.cpp * start of Tautomer Enumerate with some tests * added BondType option to Tautomer Enumeration * correcting for some memory leakage * a few alterations to formatting * sorting out some memory leaks * sorting out some memory leaks * some corrections for PCS test set * redo tests with updated RDKit * fixing memory leak * more fixes after 100kPCS set testing * using tab as delimiter in CSVs rather than comma * tutorial for MolStandardize * still working on Tautomer enumeration * deleted some empty tests * starting writing tautomer canonicalize * rename test_data -> data (the source still needs to be updated) * automatic source reformatting * adjust to directory rename * move the fragment catalog test into the MolStandardize directory do not create separate library for FragmentCatalog * stop building separate libraries for the catalogs * move the CleanupParameters into the MolStandardize namespace * first pass at python wrapper * move the py module to the correct dir; add some python tests; add standardizeSmiles to python wrapper * disabling the compareMolVSTest since that requires command line arguments to run * get this building on windows * put the python lib in the right place * further work on python wrapper for rdMolStandardize * added get and set functions to Metal and wrapped them * added get and set functions to Metal and wrapped them * changed construstor of Reionizer class and input args for reionize, wrapped this default * overload Reionizer constructor so user can input own AcidBaseFile from python * added Uncharger class to Charge and added test for Uncharger * wrapped Fragment, fixed some memory leakage, changed some args and return types, added some tests * wrapped Normalized and changed how Normalizer class is initiated * changing MolVSValidation structure so user can choose which MolVS submethod they want * starting to write Wrap for Validate * now it compiles with Wrap/Validate.cpp * a couple refactorings around validate * move the validate code into the rdMolStandardize module * make sure a valid pointer is returned for standardizeSmiles * rdMolStandardize.MolVSValidation done and tests added * half way through AllowedAtomsValidation * finished AllowedAtomsValidation and DisallowedAtomsValidation * moved charge, fragment, metal, normalize into the rdMolStandardize module * changed tutorial to use wrapped code * added copyrights * added copyrights * move the data files * modify source files to adjust to the move * added validateSmiles functionality * removed std::cout * redid some of the 100k PCS tests * working on the tutorial * adding some documentation * deleting some comment lines * some changes after pull review * More changes after pull review * start of trying to make java wrap * remove some warnings, add some questions * additional warning removals, a bit more reporting * some test cleanups * enable testing of the java code
163 lines
5.2 KiB
C++
163 lines
5.2 KiB
C++
//
|
|
// Copyright (C) 2018 Susan H. Leung
|
|
//
|
|
// @@ All Rights Reserved @@
|
|
// This file is part of the RDKit.
|
|
// The contents are covered by the terms of the BSD license
|
|
// which is included in the file license.txt, found at the root
|
|
// of the RDKit source tree.
|
|
//
|
|
#include "Fragment.h"
|
|
#include <GraphMol/MolStandardize/FragmentCatalog/FragmentCatalogUtils.h>
|
|
#include <boost/tokenizer.hpp>
|
|
typedef boost::tokenizer<boost::char_separator<char>> tokenizer;
|
|
#include <GraphMol/ChemTransforms/ChemTransforms.h>
|
|
#include <GraphMol/SmilesParse/SmilesWrite.h>
|
|
#include <GraphMol/Descriptors/MolDescriptors.h>
|
|
#include <RDGeneral/types.h>
|
|
|
|
namespace RDKit {
|
|
namespace MolStandardize {
|
|
|
|
//constructor
|
|
FragmentRemover::FragmentRemover(){
|
|
BOOST_LOG(rdInfoLog) << "Initializing FragmentRemover\n" ;
|
|
FragmentCatalogParams fparams(defaultCleanupParameters.fragmentFile);
|
|
// unsigned int numfg = fparams->getNumFuncGroups();
|
|
// TEST_ASSERT(fparams->getNumFuncGroups() == 61);
|
|
this->d_fcat = new FragmentCatalog(&fparams);
|
|
this->LEAVE_LAST = true;
|
|
}
|
|
|
|
//overloaded constructor
|
|
FragmentRemover::FragmentRemover(const std::string fragmentFile, const bool leave_last){
|
|
FragmentCatalogParams fparams(fragmentFile);
|
|
this->d_fcat = new FragmentCatalog(&fparams);
|
|
this->LEAVE_LAST = leave_last;
|
|
}
|
|
|
|
//Destructor
|
|
FragmentRemover::~FragmentRemover(){
|
|
delete d_fcat;
|
|
};
|
|
|
|
ROMol *FragmentRemover::remove(const ROMol &mol) {
|
|
BOOST_LOG(rdInfoLog) << "Running FragmentRemover\n" ;
|
|
PRECONDITION(this->d_fcat, "");
|
|
const FragmentCatalogParams *fparams = this->d_fcat->getCatalogParams();
|
|
|
|
PRECONDITION(fparams, "");
|
|
|
|
const std::vector<std::shared_ptr<ROMol>> &fgrps = fparams->getFuncGroups();
|
|
auto *removed = new ROMol(mol);
|
|
|
|
for (auto &fgci : fgrps) {
|
|
std::vector<boost::shared_ptr<ROMol>> frags = MolOps::getMolFrags(*removed);
|
|
// If nothing is left or leave_last and only one fragment, end here
|
|
if (removed->getNumAtoms() == 0 ||
|
|
(this->LEAVE_LAST && frags.size() <= 1)) {
|
|
break;
|
|
}
|
|
|
|
std::string fname;
|
|
fgci->getProp(common_properties::_Name, fname);
|
|
ROMol *tmp = RDKit::deleteSubstructs(*removed, *fgci, true);
|
|
|
|
if (tmp->getNumAtoms() != removed->getNumAtoms()) {
|
|
BOOST_LOG(rdInfoLog) << "Removed fragment: " << fname << "\n";
|
|
}
|
|
|
|
if (this->LEAVE_LAST && tmp->getNumAtoms() == 0) {
|
|
// All the remaining fragments match this pattern - leave them all
|
|
delete tmp;
|
|
break;
|
|
}
|
|
delete removed;
|
|
removed = tmp;
|
|
}
|
|
return removed;
|
|
}
|
|
|
|
bool isOrganic(const ROMol &frag) {
|
|
// Returns true if fragment contains at least one carbon atom.
|
|
for (const auto at : frag.atoms()) {
|
|
if (at->getAtomicNum() == 6) {
|
|
return true;
|
|
}
|
|
}
|
|
return false;
|
|
}
|
|
|
|
LargestFragmentChooser::LargestFragmentChooser(
|
|
const LargestFragmentChooser &other) {
|
|
BOOST_LOG(rdInfoLog) << "Initializing LargestFragmentChooser\n";
|
|
PREFER_ORGANIC = other.PREFER_ORGANIC;
|
|
}
|
|
|
|
ROMol *LargestFragmentChooser::choose(const ROMol &mol) {
|
|
BOOST_LOG(rdInfoLog) << "Running LargestFragmentChooser\n";
|
|
|
|
std::vector<boost::shared_ptr<ROMol>> frags = MolOps::getMolFrags(mol);
|
|
LargestFragmentChooser::Largest l;
|
|
|
|
for (const auto &frag : frags) {
|
|
std::string smiles = MolToSmiles(*frag);
|
|
BOOST_LOG(rdInfoLog) << "Fragment: " << smiles << "\n";
|
|
bool organic = isOrganic(*frag);
|
|
if (this->PREFER_ORGANIC) {
|
|
// Skip this fragment if not organic and we already have an organic
|
|
// fragment as the largest so far
|
|
if (l.Fragment != nullptr && l.Organic && !organic) continue;
|
|
// Reset largest if it wasn't organic and this fragment is organic
|
|
// if largest and organic and not largest['organic']:
|
|
if (l.Fragment != nullptr && organic && !l.Organic) {
|
|
l.Fragment = nullptr;
|
|
}
|
|
}
|
|
unsigned int numatoms = 0;
|
|
for (const auto at : frag->atoms()) {
|
|
numatoms += 1 + at->getTotalNumHs();
|
|
}
|
|
// Skip this fragment if fewer atoms than the largest
|
|
if (l.Fragment != nullptr && (numatoms < l.NumAtoms)) continue;
|
|
|
|
// Skip this fragment if equal number of atoms but weight is lower
|
|
double weight = Descriptors::calcExactMW(*frag);
|
|
if (l.Fragment != nullptr && (numatoms == l.NumAtoms) &&
|
|
(weight < l.Weight))
|
|
continue;
|
|
|
|
// Skip this fragment if equal number of atoms and equal weight but smiles
|
|
// comes last alphabetically
|
|
if (l.Fragment != nullptr && (numatoms == l.NumAtoms) &&
|
|
(weight == l.Weight) && (smiles > l.Smiles))
|
|
continue;
|
|
|
|
BOOST_LOG(rdInfoLog) << "New largest fragment: " << smiles << " (" <<
|
|
numatoms << ")\n";
|
|
// Otherwise this is the largest so far
|
|
l.Smiles = smiles;
|
|
l.Fragment = frag;
|
|
l.NumAtoms = numatoms;
|
|
l.Weight = weight;
|
|
l.Organic = organic;
|
|
}
|
|
|
|
return new ROMol(*(l.Fragment));
|
|
}
|
|
|
|
LargestFragmentChooser::Largest::Largest()
|
|
: Smiles(""), Fragment(nullptr), NumAtoms(0), Weight(0), Organic(false) {}
|
|
|
|
LargestFragmentChooser::Largest::Largest(
|
|
std::string &smiles, const boost::shared_ptr<ROMol> &fragment,
|
|
unsigned int &numatoms, double &weight, bool &organic)
|
|
: Smiles(smiles),
|
|
Fragment(fragment),
|
|
NumAtoms(numatoms),
|
|
Weight(weight),
|
|
Organic(organic) {}
|
|
|
|
} // namespace MolStandardize
|
|
} // namespace RDKit
|