* - added some missing const keywords
- added an addFingerprint overload to allow passing pointers
- added a test
* changes in response to review
* removed print
* added missing shared_ptr declaration
* added PatternNumBitsHolder serialization
* - merged with upstream changes and resolved conflicts
- got rid of PatternNumBitsHolder and leveraged the serialization version to get the PatternHolder to be backwards-compatible
* built substructLibV1.pkl with an older version of boost
* reverted serialization version to 1
only write numBits if != 2048 and only read numBits if it exists in the archive
* bogus commit just to trigger a rebuild
* add port of centres
* Several changes:
- Added a test based on RDKit issue 2984
(default RDKit fails it, this gets it right)
- Use bond directions for bond stereo (label is no longer required)
- Fix bugs in rules 4b and 5new
- Fix some mem errors
- clang-formatted
- some other minor cleanups
* Several changes and some improvements:
- Added LGPL license, as well as a mention in the doc.
- Fix/update/add some comments
- Fix typo/bug in Mancude calculation
- Fix bug in rules 4b, 5New
- Fix Sp2 Bond dir reference
- Re clang-format
- other minor changes suggested by Dan
* Another bunch of changes:
- require integer-order bonds; kekulize when required
- fix fraction comparison
- rename sq Cis/Trans e/z
- replace queues with vectors
- update copyright notices
- revert LGPL changes
- fix Asymmetric typo
* move to separate lib/mod, add python validation test
* Moving away from the original implementation:
- Rename to CIPLabeler
- Remove the abstraction layer
- Remove some stats stuff
- Push some CIPMol functions down to Node
- Use RDKit's isotope info
* Another bundle of changes. The most relevant ones:
- fix parity translation
- use cis trans as bond reference -- breaks #2984 test
- kill a lot of unused code
- use lists for queues
- store nodes and edges in digraph
- add prefixes to class data member names
- update changeRoot() test
- use fastFindRings() for mancude rings
- update docs
- add references to the scientific paper
- Document the Mancude functions
- Fix Mancude atom types and their comments
- remove mol data member from SequenceRule
- replace Fraction with boost::rational
- update comments, docstrings and the doc
* fix building the test
* Changes here include:
- adding bitset overload for the labeling function
- python wrap of the overload
- handling trigonal pyramids with implicit H
- setting bond labels sets stereo atoms, cis/trans
- nix LEFT/RIGHT/TOGETHER/OPPOSITE constants
- don't use GLOB in cmake
- a decent amount of refactoring
* Minor edits to new_CIP_labeling (#6)
* Some changes for clarity
Added some documentation and changed some variable names to match
my understanding. Also a ran clang-tidy to ensure that all blocks
were brace-enclosed.
* Return a reference instead of a copy for performance
This is called many times and showed up after some light
profiling. This change bumped throughput by about 20%
* move out of Graphmol
* move .hpp headers to .h
* update documentation; add label set of atoms test
* Address comments:
- Added references to centres to CIPLabeler.h and Python Wrap.
- Update validation test to skip sanitization.
- Document mancude fractional atomic number calculation.
- Use unittest assertions in python test.
- Update mancude docstrings to 'resonance' instad of 'tautomers'.
- Rename prioritise() to prioritize().
- Add postcondition to check carriers size in Tetrahedral.cpp.
- Use getNeighbors() in Tetrahedral.cpp.
- Move findStereoAtoms to Chirality namespace.
- Move code back into GraphMol.
- Fix typos and reformat doc.
* More comments:
- Mention why we use boost's unordered map rather than the std one.
- Fix include in Python wrapper.
* Addressed second batch of comments:
- fix the bug in rule 4b
- fix docstring for rule 2
- move atomic mass calculation from rule 2 to node
- addressed some build warnings
- simplify sp2bond::label(comp)
- add start/end atoms to Sp2Bond constructor
- update system/local includes
Co-authored-by: Dan N <dan.nealschneider@schrodinger.com>
* add documentation
* backup
* first pass at 5-rings working
* add a static method to initialize an empty parameter object
* expose static method to python
* additional testing
* support the single bond adjustments
* cleanup
* preserve the symbol used in the query from a CTAB
* support the way the MDL code adjusts five-ring aromaticity in query rings
* in-code documentation
* while we're at it, cleanup the way Q and A atoms are handled in the v3k parser
* changes in response to review
* make this C++14 again.
* change in response to review
* Progress on #3168
* Fixes#3167
* Fixes#3169
* deal with CBONDS too
* test PATOMS
* Fixes#3175
* a bit of code simplification and test updates
still needs more testing
* more testing
* handle s-group hierarchy
also a couple of other changes in response to the review
* add forgotten test file
* changes in response to review
* Add parameter to skip proximity bonding during PDB reading
* Test proximityBonding flag
* Remove multivalent Hs and bonds to metals in PDB
* Add tests for multivalent Hs and metal unbinding
* Remove covalent bonds to waters
* Test unbinding of HOHs
* Refactor funxtions
* Rename flag for cosistency
* Include flavor in double bond perception
* Add metalorganic test (APW ligand)
* Validate input foe IsBlacklistedPair and minor changes.
* move detectBondStereoChemistry() into MolOps
* switch more code over to using the new function
* add an addStereoChemistryFrom3D() function. Needs testing still.
* add some tests
* cleanups and rename
FilterCatalogs give RDKit the ability to screen out or reject
undesirable molecules based on various criteria. Supplied
with RDKIt are the following filter sets:
* PAINS - Pan assay interference patterns.
These are separated into three sets PAINS_A, PAINS_B and PAINS_C.
Reference: Baell JB, Holloway GA. New Substructure Filters for
Removal of Pan Assay Interference Compounds (PAINS)
from Screening Libraries and for Their Exclusion in
Bioassays.
J Med Chem 53 (2010) 2719Ð40. doi:10.1021/jm901137j.
* BRENK - filters unwanted functionality due to potential tox reasons
or unfavorable pharmacokinetics.
Reference: Brenk R et al. Lessons Learnt from Assembling Screening
Libraries for Drug Discovery for Neglected Diseases.
ChemMedChem 3 (2008) 435-444. doi:10.1002/cmdc.200700139.
* NIH - annotated compounds with problematic functional groups
Reference: Doveston R, et al. A Unified Lead-oriented Synthesis of
over Fifty Molecular Scaffolds. Org Biomol Chem 13
(2014) 859Ð65.
doi:10.1039/C4OB02287D.
Reference: Jadhav A, et al. Quantitative Analyses of Aggregation,
Autofluorescence, and Reactivity Artifacts in a Screen
for Inhibitors of a Thiol Protease.
J Med Chem 53 (2009) 37Ð51. doi:10.1021/jm901070c.
* ZINC - Filtering based on drug-likeness and unwanted functional
groups
Reference: http://blaster.docking.org/filtering/
The following is C++ and Python examples of how to filter molecules.
[C++]
#include <GraphMol/FilterCatalog.h>
using namespace RDKit;
SmilesMolSupplier suppl(…);
// setup the desired catalogs
FilterCatalogParams params;
params.addCatalog(FilterCatalogParams::PAINS_A);
params.addCatalog(FilterCatalogParams::PAINS_B);
params.addCatalog(FilterCatalogParams::PAINS_C);
// create the catalog
FilterCatalog catalog(params);
unique_ptr<ROMol> mol; // automatically cleans up after us
int count = 0;
while(!suppl.atEnd()){
mol.reset(suppl.next());
TEST_ASSERT(mol.get());
// Does a PAINS filter hit?
if (catalog.hasMatch(*mol)) {
std::cerr << "Warning: molecule failed filter " << std::endl;
}
// More detailed data by retrieving the catalog entry
const FilterCatalogEntry *entry = catalog.getFirstMatch(*mol);
if (entry) {
std::cerr << "Warning: molecule failed filter: reason " <<
entry->getDescription() << std::endl;
// get the matched substructure atoms for visualization
std::vector<FilterMatch> matches;
if (entry->getFilterMatches(*mol, matches)) {
for(std::vector<FilterMatch>::const_iterator it = matches.begin();
it != matches.end(); ++it) {
// Get the SmartsMatcherBase that matched
const FilterMatch & fm = (*it);
boost::shared_ptr<SmartsMatcherBase> matchingFilter = \
fm.filterMatch;
// Get the matching atom indices
const MatchVectType &vect = fm.atomPairs;
for (MatchVectType::const_iterator it=vect.begin();
it != vect.end(); ++it) {
int atomIdx = it->second;
}
}
}
}
count ++;
} // end while
Python API
import sys
from rdkit.Chem import FilterCatalog
params = FilterCatalog.FilterCatalogParams()
params.AddCatalog(FilterCatalogParams.FilterCatalogs.PAINS_A)
params.AddCatalog(FilterCatalogParams.FilterCatalogs.PAINS_B)
params.AddCatalog(FilterCatalogParams.FilterCatalogs.PAINS_C)
catalog = FilterCatalog.FilterCatalog(params)
...
for mol in mols:
if catalog.HasMatch(mol):
print("Warning: molecule failed filter", file=sys.stderr)
# more detailed
entry = catalog.GetFirstMatch(mol)
if entry:
print("Warning: molecule failed filter: reason %s"%(
entry.GetDescription()), file=sys.stderr)
# get to the atoms involved in the substructure
# there ma be many matching filters here...
for filterMatch in entry.getFilterMatches(mol):
filter = filterMatch.filterMatch
# get a description of the matching filter
print(filter)
for queryAtomIdx, atomIdx in filterMatch.atomPairs:
# do something with the substructure matches
Advanced
FilterCatalogs are fully serializable and can be stored for later use.
To serialize a catalog, use the catalog.Serialize() method.
std::string pickle = catalog.Serialize();
To unserialize, send the resulting string into the constructor
FilterCatalog catalog(pickle);
The underlying matchers can be arbitrarily complicated and new
ones with more complicated semantics can be created. The default
matching objects are:
SmartsMatcher - match a smarts pattern or query molecule with a minimum
and maximum count
ExclusionList - returns false if any of the supplied matches exist
And - combine two matchers
Or - true if any of two matchers are true
Not - invert the match (note that this can have confusing semantics
when dealing with substructure matches)
Entries can be added at any time to a catalog:
ExclusionList excludedList;
excludedList.addPattern(SmartsMatcher("Pattern 1", smarts));
excludedList.addPattern(SmartsMatcher("Pattern 2", smarts2));
A FilterCatalog supports a few different types of matching. One is
a traditional rejection filter where if a substructure exists in
the target molecule, the molecule is rejected.
These types of queries can indicate the substructure that triggered
the rejection through the FilterCatalogEntry::GetMatch(mol)
function.
The FilterCatalog also supports acceptance filters, that are
designed to indicate which molecules are ok. These have
to be transformed into rejection filters or simply wrapped in a
Not( acceptanceFilter ) when entered into the catalog. For example,
from Zinc:
carbons [#6] 40
means that we have a maximum of 40 carbon atoms. We can write this by
converting the max count to a min count (i.e. the pattern is triggered
when the molecule has mincount atoms);
const unsigned int minCount = 40+1;
SmartsMatcher( "Too many carbons", "[#6"], minCount );
This can be properly substructure searched.
Or we can wrap this in a not:
const unsigned int minCount = 0;
const unsigned int maxCount = 40;
Not( SmartsMatcher( "ok number of carbons", "[#6]", minCount, maxCount) );
Note: Wrapping in a Not loses the ability to highlight the rejecting
pattern when visualizing the molecule.