Commit Graph

4103 Commits

Author SHA1 Message Date
Greg Landrum
9bcd28a4ff MolEnumerator::enumerate() should call updatePropertyCache() (#3420) 2020-09-23 17:28:45 -04:00
Greg Landrum
d65d47189c Fixes: #3415 (#3416)
* Fixes: #3415

* add another test
2020-09-18 10:01:10 -04:00
Greg Landrum
9a04aea918 Improvements to reaction chirality handling (#3412)
* add tests for the problem

* more testing (still no fix)

* Fixes #2891

Try to be more robust w.r.t. atom reordering in input SMILES

* better handling of differing numbers of bonds between reactants and products

all tests now passing

* update the rdkit book with more details about chirality in reactions

* changes in response to review
2020-09-18 09:33:26 +02:00
Paolo Tosco
4510bec9f3 Allow passing explicit removeHs, sanitize and strict flags to the MDL rxn parser (#3411) 2020-09-18 08:01:58 +02:00
David Cosgrove
cc705a2f55 Scale line width3305 (#3380)
* Fixed expected test results.

* Implements Github 3305.
Line widths don't scale by default.  Line widths are now floats rather than ints.

* Suggestion that testGithub565 be removed.

* Corrected comment.

* Fixed test case.

* Added scaleBondWidth to Python wrappers.

* Fixed expected test values.

* Removed testGithub5.

* Added draw option to scale highlighted bond width.

* Put all the tests back in.

* Tweaked tests to allow for difference in scale between Freetype/No-Freetype images.

* Extra write for CI error.

* Extra write for CI error.

* removed extra write for CI error.

* update expected results
remove debugging output

* Fixed stroke-width to 1 decimal place in SVG files.  Altered tests to cope.

* update expected results for python tests

Co-authored-by: David Cosgrove <david@cozchemix.co.uk>
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
2020-09-18 06:44:40 +02:00
Greg Landrum
819e46ed1e Fixes #3413 (#3414) 2020-09-18 05:44:24 +02:00
Paolo Tosco
a9da05570e Fixes a couple of query-related bugs (#3398)
* - fixes a bug with the MDL MolParser (M  ALS queries clobber previously existing queries)
- fixes a regression introduced by #3389 (duplicate formal charge queries may arise)

* changes in response to review

* forgot to commit
2020-09-17 08:16:43 +02:00
Brian Kelley
635cfbfcaf Enable recursive smarts in the rgroup decomposition (#3404) 2020-09-11 16:05:04 +02:00
Brian Kelley
202c863302 [WIP] Deprotection Library (#3294)
* Add deprotection library

* Add copyrights

* Add dll exports

* Fix doc strings, use std::shared_ptr

* Changes for Review

* clang-format

* Fix seg fault and compile issues

* Fix namespace issue

* Add examples to datastructure and use them for testing

* Make the test actually compile and run

* Remove greek character
2020-09-11 05:35:36 +02:00
Brian Kelley
728bb8defe Fixes #3403 (#3405)
* Fixes #3403

* Fix Typo

V is actually a set so we can clear it.

* Fix bug, use self.assertFalse
2020-09-10 08:11:47 +02:00
jones-gareth
9a864f4238 Sgroup (#3390)
* Changes to use SubstanceGroups in Java

* Forgot to add SWIG file

* Java test for SubstanceGroup wrappers

* Added RDKit boilerplate
2020-09-09 04:59:08 +02:00
Greg Landrum
db68b217b9 Fixes #3392 (#3395)
a bit of code modernization too
2020-09-07 08:01:02 +02:00
Greg Landrum
eaeba839a3 Fixes #3393 (#3394) 2020-09-04 12:39:58 -04:00
Greg Landrum
53a1382206 Fixes #3388 (#3389)
* Fixes #3388

* add a test for numradicals too
2020-09-04 12:38:25 -04:00
Greg Landrum
ff50f176c2 expose additional SubstanceGroup data members to Python (#3375)
* support read-only access to cstates from python

* expose GetBrackets

* expose getAttachPoints too

remove vestigial SubstanceGroupCState_VECT
2020-09-04 12:37:40 -04:00
Greg Landrum
8bcb625d1c Fix #3369 (#3370)
* show stereogroup labels when they are present

* Fixes #3369
2020-09-04 12:34:53 -04:00
Greg Landrum
19bdd21de1 Updated code for chirality perception (#3324)
* add new test (it fails, of course)

* isAtomPotentialTetrahedralCenter() there and tested
tests cases for molecular stereo written (but failing, of course)
create new_chirality.cpp, we will probably want to change this at some point
new StereoInfo structure

* more infrastructure
- isBondPotentialStereoBond()
- two getStereoInfo() functions
- associated unit tests

* backup

* oops

* backup

* switch to always using four atoms for bonds

* backup

* add new test (it fails, of course)

* isAtomPotentialTetrahedralCenter() there and tested
tests cases for molecular stereo written (but failing, of course)
create new_chirality.cpp, we will probably want to change this at some point
new StereoInfo structure

* more infrastructure
- isBondPotentialStereoBond()
- two getStereoInfo() functions
- associated unit tests

* backup

* oops

* backup

* switch to always using four atoms for bonds

* backup

* this now actually works

* doc update

* add a test to demo that ring stereo is not working

* more testing

* add a fun CIP test

* add review note

* debugging

* remove extraneous debugging
turn off tests for ring-double bond stereo

* disable the ring-stereo fix... this breaks a few tests, but we will recover

* works, needs cleanup, chirality code needs re-testing

* nothing works

* Fixes #3322

* Python and C++ tests now pass

* clang-format

* first pass at python wrappers

* improve doctest

* basic optimization...
stop with the copying

* rename

* all tests passing again

* optimization

* fix the sort in the tests

* looks like this might fix the windows-dll build problems

* update tests

* the fun never ends

* comment cleanup

* handle deliberately unspecified atoms/bonds

* add cleanIt option

* add flagPossible

* add option to use the new code to the SMILES parser

* additional testing

* additional testing

* a bit of additional testing never hurts

* changes in response to review

* fixes a bug with potential parastereo not being cleared

other changes in response to review

* update docs
2020-09-02 15:00:29 +02:00
Paolo Tosco
5a8db40181 - fixed #3349 (#3354)
- fixed a few typos
- removed const from operator(), which should be allowed to modify the owning object
- added missing const to some getters
2020-09-02 04:51:20 +02:00
Greg Landrum
0b438197c7 Add MolDraw2DJS (#3376)
* backup, does not work

* backup

* baby steps

* basics are now working

* more progress

* add substructure highlighting

* get the FT stuff working too

* get the FT stuff working too

* empirical corrections to dashed bonds

* enable coordgen support

* change min font size

* support dashed lines

* some cleanup

* support all MolDraw2D options when parsing from JSON

* parse MolDraw2D options from JSON

* show stereogroup labels when they are present

* switch to using the new CIP labels in minilib

* update demo to show controlling options

* move all the JS code into jswrapper.cpp
pass the canvas itself instead of the id to the JS functions
introduce offset

* remove extra emscripten load

* cleanup debugging stuff

* update freetype tests

* update non-freetype tests

* changes in response to review
2020-08-31 17:09:16 -04:00
Paolo Tosco
d372865b67 Fixes a bug in AddHs() involving sp2 centers with degree 1 (#3383)
* Fixes a bug in AddHs() involving sp2 centers with degree 1

* Changes in response to review

* forgot to run clang-format
2020-08-31 13:42:37 +02:00
lummyk
36ab603d9a Replace fill-opacity= with fill-opacity: (#3368) 2020-08-31 09:29:42 +02:00
Greg Landrum
1a3ce773cf Molecule metadata in PNGs (#3316)
* reader stub. Navigates the file successfullly

* works

* works

* read out all metadata, not just ours

* first pass at reading out molecules

* support mol blocks too

* add python wrapper for parser

* add direct writer

* get rid of multiple definitions in PNParser.h

* update from code review

* robustification

* handle reading compressed metadata

* support compressing metadata too

* reorder arguments to make this more consistent

* add writers to python wrappers

* forgotten file

* add pickle support

* explicit zlib dependency

* get windows builds working
at least with conda boost+zlib

* switch to using boost::iostreams to do the compression/decompression

* switch to using a vector of string pairs to store the metadata

need this so that we can contain "duplicate" keys

* add metadata output to MolDraw2DCairo
still need python test

* add a python test

* initial work at reading/writing reactions from PNGs

refactor the ReactionParser.h header a bit

* cleanup debug messages

* reaction PNG support -> python

* ReactionPickler no longer includes all ProductTemplate props

* handle metadata at the MolDraw2D level
Currently only supported by MolDraw2DCairo, it's worth extending this to SVG too

* support reading multiple molecules from a png

* support multiple molecules from files and in python

* stop duplicating tags with multiple molecules

* update to get windll builds working
this change should be propagated to more cmakelists.txt files

* make sure the metadata ends up in the notebook

* make sure PNGs in the notebook also have metadata

efficiency improvements for some notebook bits (i.e stop going PNG->Image->PNG)

* no need to pretend that we might be using PIL anymore

* documentation

* update docs to show new functionality

* not sure why the doctests failed on linux

* still trying to diagnose those failures

* protect doctests in case python interpreters are being re-used

* switch tags

* the wrapped functions to read png data from files weren't working

* <sigh>... windows dlls
2020-08-25 07:51:18 +02:00
Greg Landrum
ab94eada8b update MinimalLib Dockerfile (#3357)
to get it working again with the recent MAEParser changes
2020-08-24 14:22:56 +02:00
Greg Landrum
c0a62388a2 switch to using target_compile_definitions instead of add_definitions (#3350)
* switch to using target_compile_definitions instead of add_definitions

* missed one
2020-08-21 04:49:07 +02:00
Greg Landrum
28eb951dfd Fixes #3342 (#3343) 2020-08-19 05:17:06 +02:00
Greg Landrum
b558de22c6 Cleanup alignment dependencies (#3317)
* split MolAlign lib into two pieces

* further dependency cleanup

* release notes update

* add a missing dependency to the new library
2020-08-18 07:42:59 +02:00
Greg Landrum
02d76edc09 more bug fixes and cleanups from fuzz testing (#3339)
* ossfuzz #22301

* ossfuzz 22307

* memory leak when failing cxsmiles

* MolPickler things found by ossfuzz

* changes in response to review
2020-08-17 06:51:24 +02:00
Rachel Walker
e1322f73c6 Sped up SSSR by not storing every path back to root (#3333)
* Sped up SSSR by not storing every path back to root

This change speeds up ring performance by not storing
every path back to the root. Instead, it keeps track of
parents and rebuilds paths from the parents once a cycle
is found. It also stops the BFS once the depth of the BFS
is larger than the smallest ring (i.e., we found a path
that is longer than the smallest ring).

Before this commit:

3EOH: 0.72s
2J3N: 0.26s
1NKS: 0.018s

After this commit:
3EOH: 0.35s
2J3N: 0.07s
1NKS: 0.007s

* Fixed ordering of atoms within SSSR rings

Co-authored-by: Rachel Walker <rachel.walker@schrodinger.com>
2020-08-15 06:00:40 +02:00
Dan N
b5dcb21fef Improve performance of aromaticity detection for large molecules (#3253)
* remove trailing spaces

* 3256: Envelope aromaticity not detected in complex fused system

Removes stopping point in aromaticity detection when all atoms
are "done". This also markedly improves the performance of
aromaticity detection for very large molecules - for example,
aromitization of 3EOH from the PDB was dominated by done atom
checking before this commit.

Some aromatic bonds were missed before this commit in complex fused
systems. This happened if all atoms in the fused system were also
in some smaller aromatic ring and there was at least one fused edge
that was single in the kekule form.

Some example molecules for which envelope aromaticity failed
before this commit:

c1cc2n(c1)c1cccn1c1cccn21
-> became c1cc2n(c1)-c1cccn1-c1cccn1-2 before this commit
c1cc2c3cc[nH]c3n3cccc3n2c1
-> became c1cc2n(c1)-c1cccn1-c1[nH]ccc1-2 before this commit
c1cc2c3cc[nH]c3c3cc[nH]c3n2c1
-> became c1cc2n(c1)-c1[nH]ccc1-c1[nH]ccc1-2 before this commit

Here's a similar example that didn't fail even before this
commit. The central ring only shares double bonds with the
exterior rings.
* c1cc2c([nH]1)c1cc[nH]c1c1cc[nH]c21

Requires updates to some MQN descriptors tests because some
bonds become aromatic (MQN includes counts of single and
double bonds of kekule form).

FWIW, for the molecule that had a change in counts, the counts
were incorrect both before and after this commit, because
MQN uses an approximation (dividing aromatic bonds evenly
between single and double bonds) to avoid kekulization.
This approximation is invalid when there are oodles of
nitrogens lone pairs participating in the aromatic
bonds.

(the failing line was 2558 in aromat_regress.txt: Cc1cc2n(n1)c1cc(C)nn1c1c(C=O)c(C)nn21)

* Detect envelope aromaticity in fused systems

In #3253, we proposed removing doneAtoms for performance, and it was
noted that it also fixed detection of envelope aromaticity in some
fused systems. However, when I completely removed doneAtoms, I saw
hangs in sanitization of things like nanotubes. Using doneBonds
allows envelope aromaticity, while preserving a reasonable break
on runaway work for crazy molecules.

The performance issue was addressed by caching the ring bond
count.

Here are some sanitize timings on proteins from the RCSB PDB:
Before this commit:
* 3eoh 1.21s
* 2j3n 0.77s
* 1nks 0.053s

Afterwards:
* 3eoh 0.42s
* 2j3n 0.15s
* 1nks 0.046s

* Use boost::dynamic_bitset instead of unordered_set

To cound ring bonds.
2020-08-13 05:57:16 +02:00
Greg Landrum
daf78eb62c add test for #3325 (#3331) 2020-08-10 13:17:41 -04:00
Greg Landrum
1dd2d75615 Fixes #3314 (#3318)
Cleans up and simplifies (a bit) the code for perceiving chirality
based on a conformer and wedged bonds.
2020-08-10 13:12:01 -04:00
jones-gareth
aa4d5dc22c Fixes for aromatic bond fuzzy queries (#3328)
* C# wrapper for fragmentMolOnBonds

* Fix failing tautomer query test

* Fix ChemTransforms.i

* SmartsWriter fix
2020-08-10 05:00:19 +02:00
Greg Landrum
c7e7614568 Fix #3312 (#3313)
* additional testing: ensure we can delete SubstanceGroups

* fixes #3312

* stupid test mistake

* Fixes #3315
2020-08-01 04:36:34 +02:00
Greg Landrum
f14f8a60de Expanded support for CXSMILES features (#3292)
* move replaceAtomWithQueryAtom() and completeMolQueries() to QueryOps namespace

* support ring bonds from cxsmiles

* add a test that is still failing

* update nonHydrogenDegree query, add SMARTS extension for that

* some cleanup

* unsaturation and substitution count

* fix typo in test

* update expected result

* add linknodes

* add variable attachment points

* improve documentation of supported cxsmiles features

* clarifying the docs

* support leaving out the outer atoms in LN specs
2020-07-25 05:06:08 +02:00
Greg Landrum
938a14ef81 Get PPC builds working (#3285)
* temporarily disable the test

* update test

* debug output

* catch a problem with *systems that are *almost* degenerate

* final cleanups
2020-07-25 05:05:22 +02:00
Greg Landrum
967c4bf824 Stop trying to assign hybridization to actinides (#3281)
* Stop trying to assign hybridization to actinides

There's also some cleanup in this commit

* Apply suggestions from code review

Co-authored-by: Paolo Tosco <paolo.tosco.mail@gmail.com>

Co-authored-by: Paolo Tosco <paolo.tosco.mail@gmail.com>
2020-07-25 05:04:54 +02:00
Paolo Tosco
ebd384347c Use operator() and __call__() consistently across RDKit (#3295)
* Changed the callback() and compare() functions to operator() (C++)
and __call__() (Python) for consistency. The old functions are
deprecated and will be removed in Release 2021.01

* changes in response to review
2020-07-23 14:15:58 +02:00
Greg Landrum
a9010da8a4 Small bug fixes and cleanups from fuzz testing (#3299)
* fix ossfuzz issue 24074

* fix ossfuzz issue 23896

* switch to throw exceptions when reading ints/floats

* remove extraneous benchmarking code

* change type of AH query

* confirm an invariant while finding rings

* no sense in adding these tests to github

* switch to use fail() instead of failbit
switch to acceptSpaces by default
2020-07-22 16:57:31 +02:00
Greg Landrum
ef5ec47b1d Embed default truetype font (#3288)
* Embed default font in source.
Unfortunately there's some reformatting in here too.

* make moldraw2d build correctly with _static and in emscripten

* get minmalllib working with freetype and emscripten
2020-07-15 08:03:58 +02:00
Greg Landrum
e1fdca2b05 ScaffoldNetwork: add feature to count the number of molecules a scaffold originates from (#3275)
* add feature to count the number of molecules a scaffold originates from

* make this work with SWIG

* update the type of the version constant
2020-07-15 06:59:51 +02:00
Greg Landrum
b55514806d bonds with "either' stereo cannot be read from JSON (#3290)
* do not require stereoAtoms for "either" stereochemistry

* add one more test
2020-07-14 10:06:12 -04:00
Paolo Tosco
c118c416ed Fixes #2597 (#3213)
* - major refactoring (fixes #2597)
- adds support for C++ and Python callbacks to monitor progress
- cumulated bonds should not appear in rings
- UNCONSTRAINED_CATIONS now implies ALLOW_CHARGE_SEPARATION and ALLOW_INCOMPLETE_OCTETS
  as it should already have
- UNCONSTRAINED_ANIONS now implies ALLOW_CHARGE_SEPARATION
  as it should already have

* removed spurious debugging messag

* changes in response to review
2020-07-13 15:15:24 +02:00
jones-gareth
0b97153a79 Add ScaffoldNetwork to csharp (#3289) 2020-07-13 14:48:34 +02:00
Paolo Tosco
947144ab6d Avoid really slow Windows conda builds (#3287)
* avoid that if CMAKE_BINARY_DIR==CMAKE_SOURCE_DIR (e.g., conda builds)
export.h is overwritten at each incremental build causing the whole
RDKit to be rebult every time

* use existing CMake function so we don't have to maintain our own
2020-07-13 14:47:29 +02:00
Greg Landrum
c548159c6f Decouple coordgen and maeparser integrations (#3286)
* decouple the maeparser and coordgen support

* further untangling of coordgen and maeparser

* more tweaks

* be explicit in the azure devops builds

* forgot one file

* get win64_dll builds working
2020-07-13 14:46:40 +02:00
Brian Kelley
015fed1e67 rgroup speedup (#3279)
* First part of restructuring rgroup decomposition classes

* Add docs

* Cache often calculated values and simplfy lookups

* Fix accidentally deleted code

* Merge doc changes

* Remove unused timing code

* Remove unused header

* Remove redundant doc string

* Fix chrono issues

* Response to review

Co-authored-by: Brian Kelley <bkelley@relaytx.com>
2020-07-12 06:56:15 +02:00
Greg Landrum
a951abfac5 Fixes #3267 (#3284)
* make RDStreams work when RDK_USE_BOOST_IOSTREAMS is OFF

* indicate when the tests finish

* works. needs to be edited before final PR

* dockerfile for the stepwise commit

* typo

* Update Dockerfile
2020-07-11 13:42:32 +02:00
Greg Landrum
73d26036de Support enumerating some mol file features into MolBundles (#3257)
* backup

* compiles

* progress, but not there yet

* basics now working

* start towards adding another test

* test having two variation points

* add actual enumeration and the corresponding tests

* docs and cleanup

* cleanups to get the mac build working

* attempt to get win32 dll builds to work

* dlls are fun

* Add FixedMolSizeMolBundle class

* changes in response to review

Also: add warnings for bad input in ParseV3000Array

* a bit of refactoring

* additional testing

* does not work, backup

* LINKNODES work now

* cleanup

* allow silencing reaction validation warnings during initialization

* docs

* fix (and test) handling of empty enumerations

* silence warnings when doing alchemy

* first pass at a Python wrapper for the enumerator

* Add Java wrappers for MolBundle and the MolEnumerator

* cleanup some comment formatting
2020-07-11 12:54:23 +02:00
Brian Kelley
adf19f7517 RGroupDecomposition restructuring (#3270)
* First part of restructuring rgroup decomposition classes

* Add docs

* Move doc strings before structs
2020-07-08 11:47:00 -04:00
Ric
d54e77e375 Add new CIP labelling algorithm (#3234)
* add port of centres

* Several changes:
    - Added a test based on RDKit issue 2984
        (default RDKit fails it, this gets it right)
    - Use bond directions for bond stereo (label is no longer required)
    - Fix bugs in rules 4b and 5new
    - Fix some mem errors
    - clang-formatted
    - some other minor cleanups

* Several changes and some improvements:
    - Added LGPL license, as well as a mention in the doc.
    - Fix/update/add some comments
    - Fix typo/bug in Mancude calculation
    - Fix bug in rules 4b, 5New
    - Fix Sp2 Bond dir reference
    - Re clang-format
    - other minor changes suggested by Dan

* Another bunch of changes:
  - require integer-order bonds; kekulize when required
  - fix fraction comparison
  - rename sq Cis/Trans e/z
  - replace queues with vectors
  - update copyright notices
  - revert LGPL changes
  - fix Asymmetric typo

* move to separate lib/mod, add python validation test

* Moving away from the original implementation:
    - Rename to CIPLabeler
    - Remove the abstraction layer
    - Remove some stats stuff
    - Push some CIPMol functions down to Node
    - Use RDKit's isotope info

* Another bundle of changes. The most relevant ones:
    - fix parity translation
    - use cis trans as bond reference -- breaks #2984 test
    - kill a lot of unused code
    - use lists for queues
    - store nodes and edges in digraph
    - add prefixes to class data member names
    - update changeRoot() test
    - use fastFindRings() for mancude rings
    - update docs
    - add references to the scientific paper
    - Document the Mancude functions
    - Fix Mancude atom types and their comments
    - remove mol data member from SequenceRule
    - replace Fraction with boost::rational
    - update comments, docstrings and the doc

* fix building the test

* Changes here include:
    - adding bitset overload for the labeling function
    - python wrap of the overload
    - handling trigonal pyramids with implicit H
    - setting bond labels sets stereo atoms, cis/trans
    - nix LEFT/RIGHT/TOGETHER/OPPOSITE constants
    - don't use GLOB in cmake
    - a decent amount of refactoring

* Minor edits to new_CIP_labeling (#6)

* Some changes for clarity

Added some documentation and changed some variable names to match
my understanding. Also a ran clang-tidy to ensure that all blocks
were brace-enclosed.

* Return a reference instead of a copy for performance

This is called many times and showed up after some light
profiling. This change bumped throughput by about 20%

* move out of Graphmol

* move .hpp headers to .h

* update documentation; add label set of atoms test

* Address comments:
    - Added references to centres to CIPLabeler.h and Python Wrap.
    - Update validation test to skip sanitization.
    - Document mancude fractional atomic number calculation.
    - Use unittest assertions in python test.
    - Update mancude docstrings to 'resonance' instad of 'tautomers'.
    - Rename prioritise() to prioritize().
    - Add postcondition to check carriers size in Tetrahedral.cpp.
    - Use getNeighbors() in Tetrahedral.cpp.
    - Move findStereoAtoms to Chirality namespace.
    - Move code back into GraphMol.
    - Fix typos and reformat doc.

* More comments:
    - Mention why we use boost's unordered map rather than the std one.
    - Fix include in Python wrapper.

* Addressed second batch of comments:
    - fix the bug in rule 4b
    - fix docstring for rule 2
    - move atomic mass calculation from rule 2 to node
    - addressed some build warnings
    - simplify sp2bond::label(comp)
    - add start/end atoms to Sp2Bond constructor
    - update system/local includes

Co-authored-by: Dan N <dan.nealschneider@schrodinger.com>
2020-07-07 20:34:33 +02:00