Commit Graph

48 Commits

Author SHA1 Message Date
Greg Landrum
af3bb3e78b Allow partial deserialization of molecules (#4040)
* make pickling/depickling conformers optional

* make de-pickling properties optional

* support the new options in molecule ctors

* update doctest
2021-04-24 07:22:55 +02:00
Paolo Tosco
8030c36e5b Minor tweaks to SubstructLibrary (#3564)
* - added some missing const keywords
- added an addFingerprint overload to allow passing pointers
- added a test

* changes in response to review

* removed print

* added missing shared_ptr declaration

* added PatternNumBitsHolder serialization

* - merged with upstream changes and resolved conflicts
- got rid of PatternNumBitsHolder and leveraged the serialization version to get the PatternHolder to be backwards-compatible

* built substructLibV1.pkl with an older version of boost

* reverted serialization version to 1
only write numBits if != 2048 and only read numBits if it exists in the archive

* bogus commit just to trigger a rebuild
2020-12-09 19:42:38 -05:00
jones-gareth
9a864f4238 Sgroup (#3390)
* Changes to use SubstanceGroups in Java

* Forgot to add SWIG file

* Java test for SubstanceGroup wrappers

* Added RDKit boilerplate
2020-09-09 04:59:08 +02:00
Ric
d54e77e375 Add new CIP labelling algorithm (#3234)
* add port of centres

* Several changes:
    - Added a test based on RDKit issue 2984
        (default RDKit fails it, this gets it right)
    - Use bond directions for bond stereo (label is no longer required)
    - Fix bugs in rules 4b and 5new
    - Fix some mem errors
    - clang-formatted
    - some other minor cleanups

* Several changes and some improvements:
    - Added LGPL license, as well as a mention in the doc.
    - Fix/update/add some comments
    - Fix typo/bug in Mancude calculation
    - Fix bug in rules 4b, 5New
    - Fix Sp2 Bond dir reference
    - Re clang-format
    - other minor changes suggested by Dan

* Another bunch of changes:
  - require integer-order bonds; kekulize when required
  - fix fraction comparison
  - rename sq Cis/Trans e/z
  - replace queues with vectors
  - update copyright notices
  - revert LGPL changes
  - fix Asymmetric typo

* move to separate lib/mod, add python validation test

* Moving away from the original implementation:
    - Rename to CIPLabeler
    - Remove the abstraction layer
    - Remove some stats stuff
    - Push some CIPMol functions down to Node
    - Use RDKit's isotope info

* Another bundle of changes. The most relevant ones:
    - fix parity translation
    - use cis trans as bond reference -- breaks #2984 test
    - kill a lot of unused code
    - use lists for queues
    - store nodes and edges in digraph
    - add prefixes to class data member names
    - update changeRoot() test
    - use fastFindRings() for mancude rings
    - update docs
    - add references to the scientific paper
    - Document the Mancude functions
    - Fix Mancude atom types and their comments
    - remove mol data member from SequenceRule
    - replace Fraction with boost::rational
    - update comments, docstrings and the doc

* fix building the test

* Changes here include:
    - adding bitset overload for the labeling function
    - python wrap of the overload
    - handling trigonal pyramids with implicit H
    - setting bond labels sets stereo atoms, cis/trans
    - nix LEFT/RIGHT/TOGETHER/OPPOSITE constants
    - don't use GLOB in cmake
    - a decent amount of refactoring

* Minor edits to new_CIP_labeling (#6)

* Some changes for clarity

Added some documentation and changed some variable names to match
my understanding. Also a ran clang-tidy to ensure that all blocks
were brace-enclosed.

* Return a reference instead of a copy for performance

This is called many times and showed up after some light
profiling. This change bumped throughput by about 20%

* move out of Graphmol

* move .hpp headers to .h

* update documentation; add label set of atoms test

* Address comments:
    - Added references to centres to CIPLabeler.h and Python Wrap.
    - Update validation test to skip sanitization.
    - Document mancude fractional atomic number calculation.
    - Use unittest assertions in python test.
    - Update mancude docstrings to 'resonance' instad of 'tautomers'.
    - Rename prioritise() to prioritize().
    - Add postcondition to check carriers size in Tetrahedral.cpp.
    - Use getNeighbors() in Tetrahedral.cpp.
    - Move findStereoAtoms to Chirality namespace.
    - Move code back into GraphMol.
    - Fix typos and reformat doc.

* More comments:
    - Mention why we use boost's unordered map rather than the std one.
    - Fix include in Python wrapper.

* Addressed second batch of comments:
    - fix the bug in rule 4b
    - fix docstring for rule 2
    - move atomic mass calculation from rule 2 to node
    - addressed some build warnings
    - simplify sp2bond::label(comp)
    - add start/end atoms to Sp2Bond constructor
    - update system/local includes

Co-authored-by: Dan N <dan.nealschneider@schrodinger.com>
2020-07-07 20:34:33 +02:00
Greg Landrum
b55376f284 Adds more options to adjustQueryProperties (#3235)
* add documentation

* backup

* first pass at 5-rings working

* add a static method to initialize an empty parameter object

* expose static method to python

* additional testing

* support the single bond adjustments

* cleanup

* preserve the symbol used in the query from a CTAB

* support the way the MDL code adjusts five-ring aromaticity in query rings

* in-code documentation

* while we're at it, cleanup the way Q and A atoms are handled in the v3k parser

* changes in response to review

* make this C++14 again.

* change in response to review
2020-06-22 09:17:50 -04:00
Greg Landrum
95613b6279 Allow SubstanceGroups to survive molecule edits (#3170)
* Progress on #3168

* Fixes #3167

* Fixes #3169

* deal with CBONDS too

* test PATOMS

* Fixes #3175

* a bit of code simplification and test updates

still needs more testing

* more testing

* handle s-group hierarchy
also a couple of other changes in response to the review

* add forgotten test file

* changes in response to review
2020-05-19 17:35:08 +02:00
Greg Landrum
ab061d532f fix start/end atoms when wedging bonds (#2861) 2019-12-27 07:21:49 -05:00
Eisuke Kawashima
185ec927ab Unset executable flag 2019-10-10 20:18:43 +09:00
Greg Landrum
069e920645 Fixes #2224 (#2234)
* Fixes #2224

* test the basics
2019-01-21 11:31:02 -05:00
Greg Landrum
d9b06a733b Fixes #1936 (#1945)
* Fixes #1936
doctests of the book still need to be verified

* a fix that is related to #1940

* add test for what was actually reported
2018-07-05 11:53:54 -04:00
Paolo Tosco
503b84995c - make bond stereo detection in rings consistent (#1727) 2018-02-01 04:28:10 +01:00
Greg Landrum
d253aabc86 remove an output file that never should have been checked in 2017-12-05 08:21:35 +01:00
Maciej Wójcikowski
10fbd483bb [MRG] Fix PDB reader + add argument to toggle proximity bonding (#1629)
* Add parameter to skip proximity bonding during PDB reading

* Test proximityBonding flag

* Remove multivalent Hs and bonds to metals in PDB

* Add tests for multivalent Hs and metal unbinding

* Remove covalent bonds to waters

* Test unbinding of HOHs

* Refactor funxtions

* Rename flag for cosistency

* Include flavor in double bond perception

* Add metalorganic test (APW ligand)

* Validate input foe IsBlacklistedPair and minor changes.
2017-11-15 06:53:31 +01:00
Greg Landrum
64399a46f0 Fixes github1497 (#1555)
* move detectBondStereoChemistry() into MolOps

* switch more code over to using the new function

* add an addStereoChemistryFrom3D() function. Needs testing still.

* add some tests

* cleanups and rename
2017-09-11 08:37:32 -04:00
Greg Landrum
9dcef9ac57 Fixes #607 (#1075) 2016-09-23 04:57:07 +02:00
Paolo Tosco
8b5176f8c9 - initial work to put the Trajectory code into a separate object 2016-05-09 19:05:15 +01:00
Greg Landrum
027d231e38 add a test (still fails) 2016-03-09 07:27:11 +01:00
Brian Kelley
c5f210e7e7 RDKit learns how to filter PAINS/BRENK/ZINC/NIH via FilterCatalog
FilterCatalogs give RDKit the ability to screen out or reject 
undesirable molecules based on various criteria.  Supplied 
with RDKIt are the following filter sets:

  * PAINS - Pan assay interference patterns.  
    These are separated into three sets PAINS_A, PAINS_B and PAINS_C.
    Reference: Baell JB, Holloway GA. New Substructure Filters for 
               Removal of Pan Assay Interference Compounds (PAINS) 
               from Screening Libraries and for Their Exclusion in 
               Bioassays.
               J Med Chem 53 (2010) 2719Ð40. doi:10.1021/jm901137j.

  * BRENK - filters unwanted functionality due to potential tox reasons 
            or unfavorable pharmacokinetics.
    Reference: Brenk R et al. Lessons Learnt from Assembling Screening 
               Libraries for Drug Discovery for Neglected Diseases.
               ChemMedChem 3 (2008) 435-444. doi:10.1002/cmdc.200700139.

  * NIH - annotated compounds with problematic functional groups
     Reference: Doveston R, et al. A Unified Lead-oriented Synthesis of 
                over Fifty Molecular Scaffolds. Org Biomol Chem 13 
                (2014) 859Ð65.
                doi:10.1039/C4OB02287D.
     Reference: Jadhav A, et al. Quantitative Analyses of Aggregation, 
                Autofluorescence, and Reactivity Artifacts in a Screen 
                for Inhibitors of a Thiol Protease.
                J Med Chem 53 (2009) 37Ð51. doi:10.1021/jm901070c.

  * ZINC - Filtering based on drug-likeness and unwanted functional 
           groups
    Reference: http://blaster.docking.org/filtering/

The following is C++ and Python examples of how to filter molecules.

[C++]

#include <GraphMol/FilterCatalog.h>
using namespace RDKit;

    SmilesMolSupplier suppl(…);

    // setup the desired catalogs
    FilterCatalogParams params;
    params.addCatalog(FilterCatalogParams::PAINS_A);
    params.addCatalog(FilterCatalogParams::PAINS_B);
    params.addCatalog(FilterCatalogParams::PAINS_C);
    
    // create the catalog
    FilterCatalog catalog(params);

    unique_ptr<ROMol> mol; // automatically cleans up after us    
    int count = 0;
    while(!suppl.atEnd()){
      mol.reset(suppl.next());
      TEST_ASSERT(mol.get());

      // Does a PAINS filter hit?
      if (catalog.hasMatch(*mol)) {
        std::cerr << "Warning: molecule failed filter " << std::endl;
      }
      
      // More detailed data by retrieving the catalog entry
      const FilterCatalogEntry *entry = catalog.getFirstMatch(*mol);
      if (entry) {
        std::cerr << "Warning: molecule failed filter: reason " <<
          entry->getDescription() << std::endl;
        
        // get the matched substructure atoms for visualization
        std::vector<FilterMatch> matches;
        if (entry->getFilterMatches(*mol, matches)) {
          for(std::vector<FilterMatch>::const_iterator it = matches.begin();
              it != matches.end(); ++it) {
            // Get the SmartsMatcherBase that matched
            const FilterMatch & fm = (*it);
            boost::shared_ptr<SmartsMatcherBase> matchingFilter = \
              fm.filterMatch;
            
            // Get the matching atom indices
            const MatchVectType &vect = fm.atomPairs;
            for (MatchVectType::const_iterator it=vect.begin();
                 it != vect.end(); ++it) {
                 int atomIdx = it->second;
            }

          }
        }
      }
      count ++;
    } // end while

Python API

  import sys
  from rdkit.Chem import FilterCatalog

  params = FilterCatalog.FilterCatalogParams()
  params.AddCatalog(FilterCatalogParams.FilterCatalogs.PAINS_A)
  params.AddCatalog(FilterCatalogParams.FilterCatalogs.PAINS_B)
  params.AddCatalog(FilterCatalogParams.FilterCatalogs.PAINS_C)
  catalog = FilterCatalog.FilterCatalog(params)
  
  ...
  for mol in mols:
      if catalog.HasMatch(mol):
         print("Warning: molecule failed filter", file=sys.stderr)
      # more detailed
      entry = catalog.GetFirstMatch(mol)
      if entry:
         print("Warning: molecule failed filter: reason %s"%(
           entry.GetDescription()), file=sys.stderr)
           
         # get to the atoms involved in the substructure
         #  there ma be many matching filters here...
         for filterMatch in entry.getFilterMatches(mol):
             filter = filterMatch.filterMatch
             # get a description of the matching filter
             print(filter)
             for queryAtomIdx, atomIdx in filterMatch.atomPairs:
                 # do something with the substructure matches

Advanced

 FilterCatalogs are fully serializable and can be stored for later use.

  To serialize a catalog, use the catalog.Serialize() method.
     std::string pickle = catalog.Serialize();
     
  To unserialize, send the resulting string into the constructor
     FilterCatalog catalog(pickle);


 The underlying matchers can be arbitrarily complicated and new
  ones with more complicated semantics can be created.  The default
  matching objects are:

  SmartsMatcher - match a smarts pattern or query molecule with a minimum 
                  and maximum count
  ExclusionList - returns false if any of the supplied matches exist

  And - combine two matchers
  Or  - true if any of two matchers are true
  Not - invert the match (note that this can have confusing semantics
          when dealing with substructure matches)

  Entries can be added at any time to a catalog:
  
   ExclusionList excludedList;   

    excludedList.addPattern(SmartsMatcher("Pattern 1", smarts)); 
    excludedList.addPattern(SmartsMatcher("Pattern 2", smarts2)); 
   

  A FilterCatalog supports a few different types of matching.  One is
  a traditional rejection filter where if a substructure exists in
  the target molecule, the molecule is rejected.

  These types of queries can indicate the substructure that triggered
  the rejection through the FilterCatalogEntry::GetMatch(mol)
  function.

  The FilterCatalog also supports acceptance filters, that are
  designed to indicate which molecules are ok.  These have
  to be transformed into rejection filters or simply wrapped in a 
  Not( acceptanceFilter ) when entered into the catalog.  For example, 
   from Zinc:

    carbons [#6] 40

  means that we have a maximum of 40 carbon atoms.  We can write this by
  converting the max count to a min count (i.e. the pattern is triggered
  when the molecule has mincount atoms);

    const unsigned int minCount = 40+1;
    SmartsMatcher( "Too many carbons", "[#6"], minCount );

  This can be properly substructure searched.

  Or we can wrap this in a not:
  
    const unsigned int minCount = 0;
    const unsigned int maxCount = 40;
    Not( SmartsMatcher( "ok number of carbons", "[#6]", minCount, maxCount) );

  Note: Wrapping in a Not loses the ability to highlight the rejecting
    pattern when visualizing the molecule.
2015-07-14 10:31:31 -04:00
Nadine Schneider
0cf0dd37ce Bugfix in SmilesWrite and some additional tests for getMolFrags function 2015-04-16 10:53:20 +02:00
Nadine Schneider
5d963846b8 merge 2015-04-10 09:44:18 +02:00
Greg Landrum
74125f685c Fixes #443 2015-03-05 06:38:38 +01:00
Greg Landrum
ad62f6241a update coords 2015-01-09 09:58:20 +01:00
Greg Landrum
baf26c053c not fixed; still lots of debugging printing; backup commit 2015-01-09 06:33:49 +01:00
Greg Landrum
1f4c2e915c fix a nasty canonicalization problem:
need to be sure to sort neighbors by their ranks in a *decreasing* order
2015-01-07 20:46:08 +01:00
Greg Landrum
23076b1cdb Fixes #298 2014-07-23 05:31:16 +02:00
Greg Landrum
86b9e6b089 Fixes #72 2013-08-25 06:36:10 +02:00
Greg Landrum
40ab2c06e3 Fixes #87 2013-08-21 08:20:19 +02:00
Greg Landrum
294cb24de4 fix and test issue 266 2012-11-17 07:39:39 +00:00
Greg Landrum
f5eb640766 fix and test Issue3549146 2012-07-26 14:34:35 +00:00
Greg Landrum
6157345dde this version passes the ZINC natural-products torture test 2012-06-28 06:45:42 +00:00
Greg Landrum
0f3b84cd28 backup commit; we are not quite there yet 2012-06-26 06:15:08 +00:00
Greg Landrum
0085012701 fix and test issue 3525076 2012-05-10 05:35:20 +00:00
Greg Landrum
813f4863ed fix and test Issue 3480481 2012-02-18 05:34:34 +00:00
Greg Landrum
5ad945b9ce some supplier updates 2012-02-10 16:24:42 +00:00
Greg Landrum
044fda398b further ring-finding algorithmic changes to fix issue 3185548 2011-02-19 04:35:53 +00:00
Greg Landrum
aa1610797e initial fix for Issue3184458, more work should still be done here. 2011-02-18 06:31:31 +00:00
Greg Landrum
db1f25b16c chirality support in the fix for issue 2951221 2010-02-14 06:39:51 +00:00
Greg Landrum
8daba8ff30 fix sf.net issue 2951221: note this doesn't add Hs to chiral centers correctly 2010-02-13 14:53:47 +00:00
Greg Landrum
c425bc8c03 fix and test sf.net Issue2788233 2009-06-01 13:04:33 +00:00
Greg Landrum
b78abe26d6 fix and test sf.net issue 2787221 2009-05-05 12:54:41 +00:00
Greg Landrum
09455ef42e resolves issue 2705543 2009-03-28 06:43:04 +00:00
Greg Landrum
a412664c65 just for backup 2009-03-27 16:12:54 +00:00
Greg Landrum
c0e1ffdc65 fix and test Issue2313979 2008-12-02 08:50:51 +00:00
Greg Landrum
eef831c2bf fix and test issue 2316677; there probably should be a hand-crafted test for this as well, but that is going to be tricky 2008-11-20 05:55:46 +00:00
Greg Landrum
fc6c2b2f9a fixes to issues 2196817 and 2208994 2008-10-30 06:56:39 +00:00
Greg Landrum
023f7b4f0f Merge changes from the iterative chirality branch:
https://rdkit.svn.sourceforge.net/svnroot/rdkit/branches/IterativeChirality_20Aug2008
into the trunk.
This covers revs 798-828.

Dependent chirality should now be correctly handled, but the
handling of ring stereochemistry, i.e. things like:
C[C@H]1CC[C@H](C)CC1
is still not 100% correct.
2008-09-19 09:40:15 +00:00
Greg Landrum
148a3a87c4 add getTotalDegree() method to Atom; support generating stereochem information from 3D coords 2008-05-26 14:54:35 +00:00
Greg Landrum
75a79b6327 initial import 2006-05-06 22:20:08 +00:00