Files
rdkit/Code/GraphMol/MolStandardize/Fragment.cpp
Susan Leung 956fdf268c Dev/GSOC2018_MolVS_Integration (#2002)
* short test file for MolVS standardize_sm

* short test file for MolVS fragment

* short test file for MolVS metals

* short test file for MolVS normalize

* short test file for MolVS reionize

* short test file for MolVS tautomer

* short test file for MolVS validate

* long test file for MolVS standardize smiles

* long test file for MolVS fragment

* long test file for MolVS metals

* long test file for MolVS normalize

* long test file for MolVS reionize

* long test file for MolVS tautomer

* long test file for MolVS validate

* Unit tests for MolVS steps

* dropping support for Python2

* molvs/__init__.py

* molvs/charge.py

* molvs/errors.py

* molvs/fragment.py

* molvs/metal.py

* molvs/normalize.py

* molvs/resonance.py

* molvs/standardize.py

* molvs/tautomer.py

* molvs/utils.py

* molvs/validate.py

* molvs/validations.py

* molvs/cli.py

* adapted and renamed molvs/cli.py to work within $RDBASE/Contrib/MolVS/

* setup MolStandardize directories, source with empty cleanup function, header, CMake files

* corrections to empty source, header and test1.cpp

* adding empty functions and initializers to MolStandardize

* empty Metal source, header and added test

* added most of Metal.cpp functionality and made some more tests

* empty functions and initializers to Normalize

* empty functions and initializers to Validate

* added most code for RDKitDefault mode, along with some tests

* restructure for abstract base class ValidateMethod

* written in isNoneValidation for MolVSValidation

* took out isNoneValidation, put in noAtomValidation, neutralValidation, isotopeValidation for MolVSValidation

* added in AllowedAtoms

* added in disallowedAtoms

* corrections to Validate

* added code for FragmentRemover

* extended fragment functionality to include choose largest fragment, added in tests for fragment catalog, fragment remover. Also added fragmentValidation method in MolStandardize

* added another test to testValidate test_fragment

* corrections to fragment

* corrections to Metal

* added code for Normalize

* added normalize member function to MolStandardize and added tests

* added multi fragment functionality to Normalize.cpp and additional tests

* TransformCatalog

* tests for Normalize.cpp

* first bit of cleanup

* added most of Charge functionality and some tests

* some corrections to Charge.cpp and some more tests to testCharge.cpp

* corrections to Charge.cpp

* start of Tautomer Enumerate with some tests

* added BondType option to Tautomer Enumeration

* correcting for some memory leakage

* a few alterations to formatting

* sorting out some memory leaks

* sorting out some memory leaks

* some corrections for PCS test set

* redo tests with updated RDKit

* fixing memory leak

* more fixes after 100kPCS set testing

* using tab as delimiter in CSVs rather than comma

* tutorial for MolStandardize

* still working on Tautomer enumeration

* deleted some empty tests

* starting writing tautomer canonicalize

* rename test_data -> data (the source still needs to be updated)

* automatic source reformatting

* adjust to directory rename

* move the fragment catalog test into the MolStandardize directory
do not create separate library for FragmentCatalog

* stop building separate libraries for the catalogs

* move the CleanupParameters into the MolStandardize namespace

* first pass at python wrapper

* move the py module to the correct dir;
add some python tests;
add standardizeSmiles to python wrapper

* disabling the compareMolVSTest since that requires command line arguments to run

* get this building on windows

* put the python lib in the right place

* further work on python wrapper for rdMolStandardize

* added get and set functions to Metal and wrapped them

* added get and set functions to Metal and wrapped them

* changed construstor of Reionizer class and input args for reionize, wrapped this default

* overload Reionizer constructor so user can input own AcidBaseFile from python

* added Uncharger class to Charge and added test for Uncharger

* wrapped Fragment, fixed some memory leakage, changed some args and return types, added some tests

* wrapped Normalized and changed how Normalizer class is initiated

* changing MolVSValidation structure so user can choose which MolVS submethod they want

* starting to write Wrap for Validate

* now it compiles with Wrap/Validate.cpp

* a couple refactorings around validate

* move the validate code into the rdMolStandardize module

* make sure a valid pointer is returned for standardizeSmiles

* rdMolStandardize.MolVSValidation done and tests added

* half way through AllowedAtomsValidation

* finished AllowedAtomsValidation and DisallowedAtomsValidation

* moved charge, fragment, metal, normalize into the rdMolStandardize module

* changed tutorial to use wrapped code

* added copyrights

* added copyrights

* move the data files

* modify source files to adjust to the move

* added validateSmiles functionality

* removed std::cout

* redid some of the 100k PCS tests

* working on the tutorial

* adding some documentation

* deleting some comment lines

* some changes after pull review

* More changes after pull review

* start of trying to make java wrap

* remove some warnings, add some questions

* additional warning removals, a bit more reporting

* some test cleanups

* enable testing of the java code
2018-09-28 11:24:25 +02:00

163 lines
5.2 KiB
C++

//
// Copyright (C) 2018 Susan H. Leung
//
// @@ All Rights Reserved @@
// This file is part of the RDKit.
// The contents are covered by the terms of the BSD license
// which is included in the file license.txt, found at the root
// of the RDKit source tree.
//
#include "Fragment.h"
#include <GraphMol/MolStandardize/FragmentCatalog/FragmentCatalogUtils.h>
#include <boost/tokenizer.hpp>
typedef boost::tokenizer<boost::char_separator<char>> tokenizer;
#include <GraphMol/ChemTransforms/ChemTransforms.h>
#include <GraphMol/SmilesParse/SmilesWrite.h>
#include <GraphMol/Descriptors/MolDescriptors.h>
#include <RDGeneral/types.h>
namespace RDKit {
namespace MolStandardize {
//constructor
FragmentRemover::FragmentRemover(){
BOOST_LOG(rdInfoLog) << "Initializing FragmentRemover\n" ;
FragmentCatalogParams fparams(defaultCleanupParameters.fragmentFile);
// unsigned int numfg = fparams->getNumFuncGroups();
// TEST_ASSERT(fparams->getNumFuncGroups() == 61);
this->d_fcat = new FragmentCatalog(&fparams);
this->LEAVE_LAST = true;
}
//overloaded constructor
FragmentRemover::FragmentRemover(const std::string fragmentFile, const bool leave_last){
FragmentCatalogParams fparams(fragmentFile);
this->d_fcat = new FragmentCatalog(&fparams);
this->LEAVE_LAST = leave_last;
}
//Destructor
FragmentRemover::~FragmentRemover(){
delete d_fcat;
};
ROMol *FragmentRemover::remove(const ROMol &mol) {
BOOST_LOG(rdInfoLog) << "Running FragmentRemover\n" ;
PRECONDITION(this->d_fcat, "");
const FragmentCatalogParams *fparams = this->d_fcat->getCatalogParams();
PRECONDITION(fparams, "");
const std::vector<std::shared_ptr<ROMol>> &fgrps = fparams->getFuncGroups();
auto *removed = new ROMol(mol);
for (auto &fgci : fgrps) {
std::vector<boost::shared_ptr<ROMol>> frags = MolOps::getMolFrags(*removed);
// If nothing is left or leave_last and only one fragment, end here
if (removed->getNumAtoms() == 0 ||
(this->LEAVE_LAST && frags.size() <= 1)) {
break;
}
std::string fname;
fgci->getProp(common_properties::_Name, fname);
ROMol *tmp = RDKit::deleteSubstructs(*removed, *fgci, true);
if (tmp->getNumAtoms() != removed->getNumAtoms()) {
BOOST_LOG(rdInfoLog) << "Removed fragment: " << fname << "\n";
}
if (this->LEAVE_LAST && tmp->getNumAtoms() == 0) {
// All the remaining fragments match this pattern - leave them all
delete tmp;
break;
}
delete removed;
removed = tmp;
}
return removed;
}
bool isOrganic(const ROMol &frag) {
// Returns true if fragment contains at least one carbon atom.
for (const auto at : frag.atoms()) {
if (at->getAtomicNum() == 6) {
return true;
}
}
return false;
}
LargestFragmentChooser::LargestFragmentChooser(
const LargestFragmentChooser &other) {
BOOST_LOG(rdInfoLog) << "Initializing LargestFragmentChooser\n";
PREFER_ORGANIC = other.PREFER_ORGANIC;
}
ROMol *LargestFragmentChooser::choose(const ROMol &mol) {
BOOST_LOG(rdInfoLog) << "Running LargestFragmentChooser\n";
std::vector<boost::shared_ptr<ROMol>> frags = MolOps::getMolFrags(mol);
LargestFragmentChooser::Largest l;
for (const auto &frag : frags) {
std::string smiles = MolToSmiles(*frag);
BOOST_LOG(rdInfoLog) << "Fragment: " << smiles << "\n";
bool organic = isOrganic(*frag);
if (this->PREFER_ORGANIC) {
// Skip this fragment if not organic and we already have an organic
// fragment as the largest so far
if (l.Fragment != nullptr && l.Organic && !organic) continue;
// Reset largest if it wasn't organic and this fragment is organic
// if largest and organic and not largest['organic']:
if (l.Fragment != nullptr && organic && !l.Organic) {
l.Fragment = nullptr;
}
}
unsigned int numatoms = 0;
for (const auto at : frag->atoms()) {
numatoms += 1 + at->getTotalNumHs();
}
// Skip this fragment if fewer atoms than the largest
if (l.Fragment != nullptr && (numatoms < l.NumAtoms)) continue;
// Skip this fragment if equal number of atoms but weight is lower
double weight = Descriptors::calcExactMW(*frag);
if (l.Fragment != nullptr && (numatoms == l.NumAtoms) &&
(weight < l.Weight))
continue;
// Skip this fragment if equal number of atoms and equal weight but smiles
// comes last alphabetically
if (l.Fragment != nullptr && (numatoms == l.NumAtoms) &&
(weight == l.Weight) && (smiles > l.Smiles))
continue;
BOOST_LOG(rdInfoLog) << "New largest fragment: " << smiles << " (" <<
numatoms << ")\n";
// Otherwise this is the largest so far
l.Smiles = smiles;
l.Fragment = frag;
l.NumAtoms = numatoms;
l.Weight = weight;
l.Organic = organic;
}
return new ROMol(*(l.Fragment));
}
LargestFragmentChooser::Largest::Largest()
: Smiles(""), Fragment(nullptr), NumAtoms(0), Weight(0), Organic(false) {}
LargestFragmentChooser::Largest::Largest(
std::string &smiles, const boost::shared_ptr<ROMol> &fragment,
unsigned int &numatoms, double &weight, bool &organic)
: Smiles(smiles),
Fragment(fragment),
NumAtoms(numatoms),
Weight(weight),
Organic(organic) {}
} // namespace MolStandardize
} // namespace RDKit