* Fixes a bug with chirality perception of T-shaped centers in very large rings
* remove those files from the chemdraw tests
should be added later once we figure out and fix what the problem on the chemdraw side is (it is not directly connected to this PR)
* be more systematic about the tolerance values
carry the same tolerances over into the bond wedging code
* re-enable those chemdraw tests
* typo
* add test
* potential fix
* whops\!
* support disabling ring stereo in rankMolAtoms
* pass atom ranks into the ring-stereo detection code
* all tests pass
* forgot a file
---------
Co-authored-by: Ric R. <ricrogz@gmail.com>
* Fist pass at CDX support
* Enable CDX support for reading (also) in the CDXMLParser API
* Add cdxml test files
* Update swig wrappers for CDXMLFormat and Parameters
* Add constructor to ChemDrawParserParams
* Add Java SWIG support for ChemDraw
* Add chemdraw define to rdconfig
* Add missing chemdraw deps
* Remove direct expat link
* Fix Java linkages for ChemDraw
* Remove bad merge code
* Remove bad merge code
* Fix csharp builds
* Add sniffer for the ChemDraw DataStream
* Include filesystem
* Fix test on windows
* Add more CDX tests
* Ensure streams are open in binary mode to support CDX on windows
* Fix text to show that a Block is the text input, not a file
* Fix CSharp test
* Disable CDX tests when not building chemdraw
* Turn back on chemdraw
* Response to review
* Turn off chemdraw support for the limited external test
---------
Co-authored-by: Brian Kelley <bkelley@glysade.com>
* Fixes#7983
move the call to cleanupAtropisomerStereoGroups() into assignStereochemistry()
* Additional tests from @susanhleung in #8323
* more testing
* changes in response to review
* changes for review
* Allow fragments to be groups in CDXML
* Add support for grouped reactants
* run clang-format
* Change github issue to 7528
* Add documents to the code
* response to review, check grouped reactants in cdxml against rxn file
* Remove unused code
* Add missing file
---------
Co-authored-by: Brian Kelley <bkelley@relaytx.com>
* Allow any bond (smiles ~) recognition in CDXML
* Move anybond.cdxml to the right place
* a bit of simplification
---------
Co-authored-by: Brian Kelley <bkelley@relaytx.com>
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* atropisomer handling added
* fixed non-used variables, linking directives
* BOOST LIB start/stop fixes, linking fix
* Fixes for RDKIT CI errors
* minimalLib fix
* changed vector<enum> for java builds
* check for extra chars in CIP labeling
* removed wrong deprecated message
* fix ostrstream output error?
* restored _ChiralAtomRank to lowercase first letter
* changes for merged master
* Fixed catch label for new Catch package
* update expected psql results
* get swig wrappers building
* restore MolFileStereochem to FileParsers
* fix java wrapper for reapplyMolBlockWedging
* some suggestions
* move a couple functions out of Bond
* Merge branch 'master' into pr/atropisomers2
* merged master
* Renamed setStereoanyFromSquiggleBond
* atropisomers in cdxml, rationalize atrop wedging, stereoGroups in drawMol
* fix for CI build
* attempt to fix java build in CI
* attempt to fix java build in CI #2
* New routine to remove non-explicit 3D-geneated chirality
* changed to use pair for atrop atoms and related bonds
* Changes as per PR reviews
* PR review respnses
* PR review reponse - more
* Fix merge from master
* fixing java ci after merge
* Updated the help doc for atripisomers
* update the atropisomer docs
* improve the images
* add the source CXSMILES
---------
Co-authored-by: greg landrum <greg.landrum@gmail.com>
* backup
* backup
* passes a lot of tests
* cleanup; still failing some tests
* pay attention to bond starting points... duh
* all tests pass
* invert y coords
* Scale bonds, make the Wedge detection cleaner, add more tests
* Readd comment
* Use document bond length
* Adds roundtrip test through a molblock
* a bit of cleanup
* remove the old code since we aren't using it any more.
* changes in response to review
---------
Co-authored-by: Brian Kelley <bkelley@relaytx.com>
* Scale bonds, make the Wedge detection cleaner, add more tests
* Readd comment
* Use document bond length
* Adds roundtrip test through a molblock
* a bit of cleanup
* change expected results for a bogus structure
add a non-ambiguous version of it
* fixes#6462
* document incompatibility
---------
Co-authored-by: Brian Kelley <bkelley@relaytx.com>
* very basics: actually parsing the new atom stereochem features
* add some input verification for the chiral permutations
* fix a typo
add quadruple bond SMILES/SMARTS extension
* add forgotten files
* patch from Roger
* add Roger's parsing examples
* typo
* new tests
* adjusted version of next PR from Roger:
- add SP2D hybridization for square planar (this may change)
- some modernizationof Chirality.cpp
- stop using < HybridizationType in Chirality.cpp (should probably do this elsewhere too)
- improved handling of hybridization assignment for new stereochem
- handle new stereo/hybridization in UFF
- tests for the above
* perception of non-tetrahedral stereo from 3D (from Roger S)
Basic testing of SP and TB based on opensmiles docs
* potential fixes for octahedral assignment
more tests
* docs update
need way more!
* map the TH tags directly to @ tags
* very basics of SMILES writing
this does not work with anything that changes the permutation order
like canonicalization or writing things in rings.
* start to support the getChiralAcross API
* more testing
* consistency
* add hasNonTetrahedralStereo() and getIdealAngleBetweenLigands()
* assignStereochemistry should only remove non-tetrahedral stereo
* re-simplify those tests
* cleanup matrix stream output
* initial pass at supporting nontet stereo in distgeom
* backup
* start on the reference docs
* TBP reference
* first pass at Oh finished
* update SP section
* more doc updates
* fix a typo
* add param to not remove Hs connected to non-tetrahedral atoms
* VERY basic coord generation for square planar
* TBP basics
* basic OH depiction
* start testing missing ligands
allow non-tet stereo in rings (ugly, but correct)
* add new TBP functions from Roger
* update depiction code for new API
* backup, the new tests work so far
* Finish the TB tests
* OH tests pass too
* cleanup
* first pass at getting correct SMILES with reordering
need way more testing than this
* ensure permutation 0 is correctly preserved
* some progress towards adding non-tetrahedral stereo to StereoInfo
* doc update
* add non-tet chiral classes to python wrappers
* make sure removeAllHs also gets neighbors of non-tetrahedral centers
more testing
* a bit of depictor cleanup
* make the assignment from 3D more tolerant
more testing
* improve the bulk testing
* cleanup
* remove a bit of redundant code
* ensure we don't write bogus permutation values to SMILES
* fix some rebase problems
* allow assignStereochemistryFrom3D() to be called without sanitization
* allow disabling the non-tetrahedral stereo when it's not explicit
* get that working on windows too
* Remove accidentally tracked files and unset x flag
* Ignore ComicNeue
* Unify test tag to `reader`
* Trivial destructors
* Bump CMAKE_CXX_STANDARD to 14 (#4165)
* - added some missing const keywords
- added an addFingerprint overload to allow passing pointers
- added a test
* changes in response to review
* removed print
* added missing shared_ptr declaration
* added PatternNumBitsHolder serialization
* - merged with upstream changes and resolved conflicts
- got rid of PatternNumBitsHolder and leveraged the serialization version to get the PatternHolder to be backwards-compatible
* built substructLibV1.pkl with an older version of boost
* reverted serialization version to 1
only write numBits if != 2048 and only read numBits if it exists in the archive
* bogus commit just to trigger a rebuild
* add port of centres
* Several changes:
- Added a test based on RDKit issue 2984
(default RDKit fails it, this gets it right)
- Use bond directions for bond stereo (label is no longer required)
- Fix bugs in rules 4b and 5new
- Fix some mem errors
- clang-formatted
- some other minor cleanups
* Several changes and some improvements:
- Added LGPL license, as well as a mention in the doc.
- Fix/update/add some comments
- Fix typo/bug in Mancude calculation
- Fix bug in rules 4b, 5New
- Fix Sp2 Bond dir reference
- Re clang-format
- other minor changes suggested by Dan
* Another bunch of changes:
- require integer-order bonds; kekulize when required
- fix fraction comparison
- rename sq Cis/Trans e/z
- replace queues with vectors
- update copyright notices
- revert LGPL changes
- fix Asymmetric typo
* move to separate lib/mod, add python validation test
* Moving away from the original implementation:
- Rename to CIPLabeler
- Remove the abstraction layer
- Remove some stats stuff
- Push some CIPMol functions down to Node
- Use RDKit's isotope info
* Another bundle of changes. The most relevant ones:
- fix parity translation
- use cis trans as bond reference -- breaks #2984 test
- kill a lot of unused code
- use lists for queues
- store nodes and edges in digraph
- add prefixes to class data member names
- update changeRoot() test
- use fastFindRings() for mancude rings
- update docs
- add references to the scientific paper
- Document the Mancude functions
- Fix Mancude atom types and their comments
- remove mol data member from SequenceRule
- replace Fraction with boost::rational
- update comments, docstrings and the doc
* fix building the test
* Changes here include:
- adding bitset overload for the labeling function
- python wrap of the overload
- handling trigonal pyramids with implicit H
- setting bond labels sets stereo atoms, cis/trans
- nix LEFT/RIGHT/TOGETHER/OPPOSITE constants
- don't use GLOB in cmake
- a decent amount of refactoring
* Minor edits to new_CIP_labeling (#6)
* Some changes for clarity
Added some documentation and changed some variable names to match
my understanding. Also a ran clang-tidy to ensure that all blocks
were brace-enclosed.
* Return a reference instead of a copy for performance
This is called many times and showed up after some light
profiling. This change bumped throughput by about 20%
* move out of Graphmol
* move .hpp headers to .h
* update documentation; add label set of atoms test
* Address comments:
- Added references to centres to CIPLabeler.h and Python Wrap.
- Update validation test to skip sanitization.
- Document mancude fractional atomic number calculation.
- Use unittest assertions in python test.
- Update mancude docstrings to 'resonance' instad of 'tautomers'.
- Rename prioritise() to prioritize().
- Add postcondition to check carriers size in Tetrahedral.cpp.
- Use getNeighbors() in Tetrahedral.cpp.
- Move findStereoAtoms to Chirality namespace.
- Move code back into GraphMol.
- Fix typos and reformat doc.
* More comments:
- Mention why we use boost's unordered map rather than the std one.
- Fix include in Python wrapper.
* Addressed second batch of comments:
- fix the bug in rule 4b
- fix docstring for rule 2
- move atomic mass calculation from rule 2 to node
- addressed some build warnings
- simplify sp2bond::label(comp)
- add start/end atoms to Sp2Bond constructor
- update system/local includes
Co-authored-by: Dan N <dan.nealschneider@schrodinger.com>
* add documentation
* backup
* first pass at 5-rings working
* add a static method to initialize an empty parameter object
* expose static method to python
* additional testing
* support the single bond adjustments
* cleanup
* preserve the symbol used in the query from a CTAB
* support the way the MDL code adjusts five-ring aromaticity in query rings
* in-code documentation
* while we're at it, cleanup the way Q and A atoms are handled in the v3k parser
* changes in response to review
* make this C++14 again.
* change in response to review
* Progress on #3168
* Fixes#3167
* Fixes#3169
* deal with CBONDS too
* test PATOMS
* Fixes#3175
* a bit of code simplification and test updates
still needs more testing
* more testing
* handle s-group hierarchy
also a couple of other changes in response to the review
* add forgotten test file
* changes in response to review
* Add parameter to skip proximity bonding during PDB reading
* Test proximityBonding flag
* Remove multivalent Hs and bonds to metals in PDB
* Add tests for multivalent Hs and metal unbinding
* Remove covalent bonds to waters
* Test unbinding of HOHs
* Refactor funxtions
* Rename flag for cosistency
* Include flavor in double bond perception
* Add metalorganic test (APW ligand)
* Validate input foe IsBlacklistedPair and minor changes.
* move detectBondStereoChemistry() into MolOps
* switch more code over to using the new function
* add an addStereoChemistryFrom3D() function. Needs testing still.
* add some tests
* cleanups and rename
FilterCatalogs give RDKit the ability to screen out or reject
undesirable molecules based on various criteria. Supplied
with RDKIt are the following filter sets:
* PAINS - Pan assay interference patterns.
These are separated into three sets PAINS_A, PAINS_B and PAINS_C.
Reference: Baell JB, Holloway GA. New Substructure Filters for
Removal of Pan Assay Interference Compounds (PAINS)
from Screening Libraries and for Their Exclusion in
Bioassays.
J Med Chem 53 (2010) 2719Ð40. doi:10.1021/jm901137j.
* BRENK - filters unwanted functionality due to potential tox reasons
or unfavorable pharmacokinetics.
Reference: Brenk R et al. Lessons Learnt from Assembling Screening
Libraries for Drug Discovery for Neglected Diseases.
ChemMedChem 3 (2008) 435-444. doi:10.1002/cmdc.200700139.
* NIH - annotated compounds with problematic functional groups
Reference: Doveston R, et al. A Unified Lead-oriented Synthesis of
over Fifty Molecular Scaffolds. Org Biomol Chem 13
(2014) 859Ð65.
doi:10.1039/C4OB02287D.
Reference: Jadhav A, et al. Quantitative Analyses of Aggregation,
Autofluorescence, and Reactivity Artifacts in a Screen
for Inhibitors of a Thiol Protease.
J Med Chem 53 (2009) 37Ð51. doi:10.1021/jm901070c.
* ZINC - Filtering based on drug-likeness and unwanted functional
groups
Reference: http://blaster.docking.org/filtering/
The following is C++ and Python examples of how to filter molecules.
[C++]
#include <GraphMol/FilterCatalog.h>
using namespace RDKit;
SmilesMolSupplier suppl(…);
// setup the desired catalogs
FilterCatalogParams params;
params.addCatalog(FilterCatalogParams::PAINS_A);
params.addCatalog(FilterCatalogParams::PAINS_B);
params.addCatalog(FilterCatalogParams::PAINS_C);
// create the catalog
FilterCatalog catalog(params);
unique_ptr<ROMol> mol; // automatically cleans up after us
int count = 0;
while(!suppl.atEnd()){
mol.reset(suppl.next());
TEST_ASSERT(mol.get());
// Does a PAINS filter hit?
if (catalog.hasMatch(*mol)) {
std::cerr << "Warning: molecule failed filter " << std::endl;
}
// More detailed data by retrieving the catalog entry
const FilterCatalogEntry *entry = catalog.getFirstMatch(*mol);
if (entry) {
std::cerr << "Warning: molecule failed filter: reason " <<
entry->getDescription() << std::endl;
// get the matched substructure atoms for visualization
std::vector<FilterMatch> matches;
if (entry->getFilterMatches(*mol, matches)) {
for(std::vector<FilterMatch>::const_iterator it = matches.begin();
it != matches.end(); ++it) {
// Get the SmartsMatcherBase that matched
const FilterMatch & fm = (*it);
boost::shared_ptr<SmartsMatcherBase> matchingFilter = \
fm.filterMatch;
// Get the matching atom indices
const MatchVectType &vect = fm.atomPairs;
for (MatchVectType::const_iterator it=vect.begin();
it != vect.end(); ++it) {
int atomIdx = it->second;
}
}
}
}
count ++;
} // end while
Python API
import sys
from rdkit.Chem import FilterCatalog
params = FilterCatalog.FilterCatalogParams()
params.AddCatalog(FilterCatalogParams.FilterCatalogs.PAINS_A)
params.AddCatalog(FilterCatalogParams.FilterCatalogs.PAINS_B)
params.AddCatalog(FilterCatalogParams.FilterCatalogs.PAINS_C)
catalog = FilterCatalog.FilterCatalog(params)
...
for mol in mols:
if catalog.HasMatch(mol):
print("Warning: molecule failed filter", file=sys.stderr)
# more detailed
entry = catalog.GetFirstMatch(mol)
if entry:
print("Warning: molecule failed filter: reason %s"%(
entry.GetDescription()), file=sys.stderr)
# get to the atoms involved in the substructure
# there ma be many matching filters here...
for filterMatch in entry.getFilterMatches(mol):
filter = filterMatch.filterMatch
# get a description of the matching filter
print(filter)
for queryAtomIdx, atomIdx in filterMatch.atomPairs:
# do something with the substructure matches
Advanced
FilterCatalogs are fully serializable and can be stored for later use.
To serialize a catalog, use the catalog.Serialize() method.
std::string pickle = catalog.Serialize();
To unserialize, send the resulting string into the constructor
FilterCatalog catalog(pickle);
The underlying matchers can be arbitrarily complicated and new
ones with more complicated semantics can be created. The default
matching objects are:
SmartsMatcher - match a smarts pattern or query molecule with a minimum
and maximum count
ExclusionList - returns false if any of the supplied matches exist
And - combine two matchers
Or - true if any of two matchers are true
Not - invert the match (note that this can have confusing semantics
when dealing with substructure matches)
Entries can be added at any time to a catalog:
ExclusionList excludedList;
excludedList.addPattern(SmartsMatcher("Pattern 1", smarts));
excludedList.addPattern(SmartsMatcher("Pattern 2", smarts2));
A FilterCatalog supports a few different types of matching. One is
a traditional rejection filter where if a substructure exists in
the target molecule, the molecule is rejected.
These types of queries can indicate the substructure that triggered
the rejection through the FilterCatalogEntry::GetMatch(mol)
function.
The FilterCatalog also supports acceptance filters, that are
designed to indicate which molecules are ok. These have
to be transformed into rejection filters or simply wrapped in a
Not( acceptanceFilter ) when entered into the catalog. For example,
from Zinc:
carbons [#6] 40
means that we have a maximum of 40 carbon atoms. We can write this by
converting the max count to a min count (i.e. the pattern is triggered
when the molecule has mincount atoms);
const unsigned int minCount = 40+1;
SmartsMatcher( "Too many carbons", "[#6"], minCount );
This can be properly substructure searched.
Or we can wrap this in a not:
const unsigned int minCount = 0;
const unsigned int maxCount = 40;
Not( SmartsMatcher( "ok number of carbons", "[#6]", minCount, maxCount) );
Note: Wrapping in a Not loses the ability to highlight the rejecting
pattern when visualizing the molecule.