* - Fix#8029
- avoid unnecessary rounding errors in the JSON writer
- remove a warning when compiling MinimalLib without SubstructLIbrary support
* changes in response to review
* changes in response to review
---------
Co-authored-by: ptosco <paolo.tosco@novartis.com>
* First pass at splitting molecule.
* Interim commit. Reading libraries from file in original format.
* Basic search seems to be working.
* Pattern fingerprint screening.
* Connector region heuristic.
* Fixed triazole (aromatic/non-aromatic connectors).
* Fix search with non-split parent query, where query is substructure of a single reagent.
* Remove duplicate hits by reaction/reagents used.
* Implement largest fragment heuristic.
* Extra test files.
* Read/write binary file.
Program for conversion from text format to binary format.
* Remove empty reagent sets on reading, probably due to synthon number counting from 1 rather than 0.
* Tidy SSSearch functions.
* Stash pending major surgery for triazole bug.
* Revert to using unique_ptr.
Correct use of reagent order.
* Function to summarise Hyperspace.
* Delay building hits till end and put cutoff on number.
* Earlier bale-out in getHitReagents.
* Streamline checkConnectorRegions.
* Remove free functions for search.
* Correct name of Python test.
* First stage of Python wrappers.
* Rename namespace.
* Parameters object.
* Mysterious windows export thing.
* Fix bug - not matching number of connectors in fragment and synthon.
* Back like it was. The connector count wasn't the problem.
* Put the substructure results into their own class.
* gcc 14 didn't like my use of std::reduce.
Update expected test results.
* Remove write statement.
* Tidy.
* Tidy.
* Enable random sample of hits.
* Test that complex SMARTS works.
Update Python wrappers.
* Rename Hyperspace to SynthonSpace.
* More renaming.
Python test.
* Enable Python test.
Remove write.
* Plug memory leak.
* Response to Greg's initial look.
* More response to Greg's initial look.
* get the windows DLL builds working
* Do away with mutable.
Purge a few more uses of reagent in favour of synthon.
Remove the c++ exe for converting text to binary databases.
* Better Synthon c'tor.
* More feedback from Greg.
* Tidy the Python wrapper.
* Remove tags from catch tests.
* Don't allow copying of SubstructureResults.
* Revert to allow copying of SubstructureResults. The Python wrapper needs it.
* Refinements based on CLion/clangd suggestions.
* Allow for map numbers in connectors in space file.
* Refactor to make the searcher a separate class from the space.
* Transfer Greg's review suggestions from Hyperspace merge.
* First cut of fingerprint searcher.
* Python wrapper.
Some tidying.
* Better random selection.
* Fix bug in preparing frags for fingerprints.
Re-factor.
* Minor-refactor.
* Sort hits by similarity if available.
* Option for a few different fingerprint types. Pending a better solution.
* Write fingerprints to binary file.
* Use any fingerprint generator for similarity searching. No Python wrapper yet.
* Python wrapper.
* Change random selection to use distribution weighted by number of hits in each reaction.
* Lots of suggestions from CLion/clang.
* Use boost discrete_distribution for cross-platform consistency.
* Tidy test up.
* Try boost rng as well.
* uniform_int_distribution to boost also.
* Small tidy.
* Method to write enumerated library.
* Windows export thing.
* Windows export thing.
* Allow for commas in tab-separated fields.
* win64 dll builds now work
* More aliphatic synthon, aromatic product joy.
* Force ring finding if it hasn't been done.
* Fingerprint hits not being sorted if maxHits reached.
* Remove debugging write. Doh!
* Response to review of SynthonSpace2.
* Missed one.
* Add test file.
* Hand merge Greg's #8050.
* Discard nodiscard.
* Move include of export.h inside include guards.
* Response to review.
* Fix memory leaks.
---------
Co-authored-by: David Cosgrove <david@cozchemix.co.uk>
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* Fix for connector regions and missing ringinfo.
* Merge in the fix for comma-separated names in tab-separated space files.
* Response to review.
---------
Co-authored-by: David Cosgrove <david@cozchemix.co.uk>
The explicit default constructor actually blocks the other
defaults. It also means this is no longer a "plain struct",
and so designated initializers aren't available.
I wanna be able to do:
Compute2DCoordParameters params{.useRingTemplates = true};
Or even:
RDDepict::compute2DCoords(*mol, {.useRingTemplates = true};
* [fix] re: issue #7572; added precondition check to prevent setting a root atom when more than one fragment exists
* tests for #7572, precondition rootAtAtom if more than one fragment exists
* test the fix to issue #7572
* [fix] moved the precondition to a block which get atom at index to prevent unhandled exceptions, MHFP tests pass now
---------
Co-authored-by: Eric Boittier <ericdavid.boittier@unibas.ch>
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* Catch exceptions in MultithreadedMolSupplier callbacks
* In next(), simply ignore any exceptions from nextCallback.
* In reader(), if readCallback throws, log a warning and proceed using
the unmodified record.
* (The writer() was already handling exceptions from writeCallback.)
* Remove unused parameter names
Hopefully this will placate the warning/error settings used by the Linux
build.
* throw if close to zero
* fix moldraw2DTestCatch
* Fix testRGroupDecomp
* fix one test in distGeomHelpersCatch
* fix tests in distGeomHelpersCatch
* retry finding a dir vector when adding Hs
* push UFF fixes to calculateCosY
* fix the setTerminalAtomCoords deg 4 patch
* add a test
* reduce zero tolerance
* First pass at splitting molecule.
* Interim commit. Reading libraries from file in original format.
* Basic search seems to be working.
* Pattern fingerprint screening.
* Connector region heuristic.
* Fixed triazole (aromatic/non-aromatic connectors).
* Fix search with non-split parent query, where query is substructure of a single reagent.
* Remove duplicate hits by reaction/reagents used.
* Implement largest fragment heuristic.
* Extra test files.
* Read/write binary file.
Program for conversion from text format to binary format.
* Remove empty reagent sets on reading, probably due to synthon number counting from 1 rather than 0.
* Tidy SSSearch functions.
* Stash pending major surgery for triazole bug.
* Revert to using unique_ptr.
Correct use of reagent order.
* Function to summarise Hyperspace.
* Delay building hits till end and put cutoff on number.
* Earlier bale-out in getHitReagents.
* Streamline checkConnectorRegions.
* Remove free functions for search.
* Correct name of Python test.
* First stage of Python wrappers.
* Rename namespace.
* Parameters object.
* Mysterious windows export thing.
* Fix bug - not matching number of connectors in fragment and synthon.
* Back like it was. The connector count wasn't the problem.
* Put the substructure results into their own class.
* gcc 14 didn't like my use of std::reduce.
Update expected test results.
* Remove write statement.
* Tidy.
* Tidy.
* Enable random sample of hits.
* Test that complex SMARTS works.
Update Python wrappers.
* Rename Hyperspace to SynthonSpace.
* More renaming.
Python test.
* Enable Python test.
Remove write.
* Plug memory leak.
* Response to Greg's initial look.
* More response to Greg's initial look.
* get the windows DLL builds working
* Do away with mutable.
Purge a few more uses of reagent in favour of synthon.
Remove the c++ exe for converting text to binary databases.
* Better Synthon c'tor.
* More feedback from Greg.
* Tidy the Python wrapper.
* Remove tags from catch tests.
* Don't allow copying of SubstructureResults.
* Revert to allow copying of SubstructureResults. The Python wrapper needs it.
* Refinements based on CLion/clangd suggestions.
* Allow for map numbers in connectors in space file.
* Response to review.
* update binary file spec
* Changes after review.
---------
Co-authored-by: David Cosgrove <david@cozchemix.co.uk>
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* Expose tautomer scoring functions to python
* Add more tests/documentation
* Rename getDefaultTautomerSubstructs to getDefaultTautomerScoreSubstructs
* Remove ROMOL_SPTR
* Add full custom scoring function example
* Run clang format
* Use proper BOOST_PYTHON_FUNCTION_OVERLOADS
* Use default copy constructor
* Use new Morgan fingerprint generator.
* Add script to build fragments database and amend score script to use it.
* Remove redundant imports.
* Response to review.
* Clarify interaction of canonical and random in MolToSmiles.
---------
Co-authored-by: Dave Cosgrove <david@cozchemix.co.uk>
* Implements #TODO
This attempts to improve the SMILES parsing procedure to provide more
information about bad inputs by pointing to general location of the offending token.
The improved error messages currently only apply to syntax errors, inputs with
extra close parentheses, and inputs with too large numbers, but this will
provide a way to extend that in the future
Some examples of the improved error messages
| old version | new version |
|----------------------------------------|------------------------|
|syntax error while parsing: c%()ccccc%() | syntax error while parsing input:<br /> `c%()ccccc%()` <br /> `~~~^` |
|syntax error while parsing: c%(100000)ccccc%(100000) | syntax error while parsing input: <br /> `c%(100000)ccccc%(100000)` <br /> `~~~~~~~~^` |
|syntax error while parsing: COc(c1)cccc1C#|syntax error while parsing input: <br /> `COc(c1)cccc1C#` <br /> `~~~~~~~~~~~~~^` |
|extra close parentheses while parsing: C) | extra close parentheses while parsing input: <br /> `C)` <br /> `~^` |
| extra close parentheses while parsing: C1C)foo | extra close parentheses while parsing input: <br /> `C1C)foo` <br /> `~~~^` |
| number too large while parsing: [555555555555555555C] | number too large while parsing input: <br /> `[555555555555555555C]` <br /> `~~~~~~~~~~^` |
This was achieves by extending the lexing and parsing procedure to track
the current token position with a new variable, so the reported position
may not be 100% accurate at all times but should be helpful in reducing
the amount of work done to find the bad location.
Additionally, I updated the build instructions to strip the #line macros
from the generated C++ files, which required a modern flex/bison
versions.
* fix build failure on windows
* static vars are not captured implicitly
* copy generated files to fix windows build failure
* fix long input truncation
* copied the wrong files again :(
* Review suggestions
* update error messges for extra open brackets
These aere example error messages:
```
[09:54:03] SMILES Parse Error: extra open parentheses while parsing: C1CC1(CC
[09:54:03] SMILES Parse Error: check for mistakes around position 6:
[09:54:03] C1CC1(CC
[09:54:03] ~~~~~^
[09:54:03] SMILES Parse Error: Failed parsing SMILES 'C1CC1(CC' for input: 'C1CC1(CC'
[09:54:03] SMILES Parse Error: extra open parentheses while parsing: C1CC1(CC(CC
[09:54:03] SMILES Parse Error: check for mistakes around position 6:
[09:54:03] C1CC1(CC(CC
[09:54:03] ~~~~~^
[09:54:03] SMILES Parse Error: extra open parentheses while parsing: C1CC1(CC(CC
[09:54:03] SMILES Parse Error: check for mistakes around position 9:
[09:54:03] C1CC1(CC(CC
[09:54:03] ~~~~~~~~^
```
* fix bad merge artefact.
* FingerprintGenerator improvements
1. simplify construction by adding ctors taking FingerprintArguments
2. remove inexplicable boost::noncopyable from FingerprintArguments
* Switch FingerprintType to be an enum class
* Fixes#7521
* dumb mistake
* initialize everything
* get the defaults right
* Update Code/GraphMol/ChemReactions/ReactionFingerprints.cpp
Co-authored-by: Paolo Tosco <paolo.tosco.mail@gmail.com>
---------
Co-authored-by: Paolo Tosco <paolo.tosco.mail@gmail.com>
- add missing Fingerprints dependency to MinimalLib CMakeLists.txt
- remove conditional RGroupDecomposition dependency in MinimalLib CMakeLists.txt, as it is always needed because of relabelMappedDummies()
wip
Co-authored-by: ptosco <paolo.tosco@novartis.com>
* Properly cleanup Dict::Pair when serializing HasPropWithQueryValue
* Make sure pickling doesn't change original molecule
* Fix bad cut and paste
* Add PairHolder utility class for memory management of non Dict Dict::Pairs, fix mem leak in pickler
* Edit comment to force a rebuild
* Ignore PairHolder from Java/Swig builds
* Ignore PairHolder API from swig
* Reponses to review
* Add backward incompatible change
* Make release note a bullet point
* Add molecules names into ScaffoldNetwork
- added parameter to ScaffoldNetwork Params to include molecule names in the
ScaffoldNetwork nodes corresponding to input molecules
* Document _Name assumption
- fixed a binary and instead of a boolean and
* Forgot has prop check before access
* Misunderstood semantics of CHECK vs. REQUIRE
* fixes a regression introduced in #7582 which made all SWIG enums become type-unsafe
* fix also PipelineStrage asserts
---------
Co-authored-by: ptosco <paolo.tosco@novartis.com>
* - moved SMILES and RGroupDecomp JSON parsers to their own translation units
- added missing DLL export decorators that had been previously forgotten
- changed the signatures of MolToCXSmiles and updateCXSmilesFieldsFromJSON
to replace enum parameters with the underlying types
- updated ReleaseNotes.md
* make sure cxSmilesFields is only updated if the JSON string contains keys
belonging to the CXSmilesFields enum
* added missing copyright notices
---------
Co-authored-by: ptosco <paolo.tosco@novartis.com>
* fixleak in CIP labels catch test
* fix leak in Murtagh clustering
* do not leak writers in streambuf
* fix leaks in fingerprintgeneratorwrapper
* remove 'minor leak' comments
* Support writing CX extensions in reactions
* fixed merge conflicts
* wip
* Updated for getCXExtensions
* Refactored and deleted extraneous file.
* Updated function signatures
* Updated some tests
* Removed extraneous include from debugging
* Removed comment in reactionwriter.cpp
* Updated some tests with expected strings
* Updated to add logging for linknodes and substance group hierarchies
* Addressed some issues
* updated tests
* Addressed Greg's comments
* Updated for recommendations
---------
Co-authored-by: Rachel Walker <rachel.walker@schrodinger.com>
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* Improved handling of SP/TB/OH reording in SMILES/SMARTS.
- add a getMaxNbors(tag) utility function to avoid repeated logic in multiple places
- tweak getChiralPermutation() for handling implicit/missing ligands (uses -1), allow inverse lookup
- use the tweaked chiral ordering in the reading/writing from SMILES
* clang-format run
* Reviewer requested changes.
- curly braces on all if conditions
- null check raw pointers with precondition.
* Correct additional test case introduced in the last year
* oxime tests
* Support allenes in canonicalizing double bonds
* alternate solution to the problem
* expand comment
* reactivate conjugated nitro test
* Fix conjugated nitro tests, have a bondstereo test
* Empty commit to re-proc tests
---------
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>