Commit Graph

8097 Commits

Author SHA1 Message Date
David Cosgrove
4ea67c70a5 Support pickling Shape inputs (#8434)
* Stash for transfer.

* Basic serialization.

* Serialization of ShapeInput.

* Default c'tor and operator=.

* Response to review.

* Add cmake to macOS CI image.

* Add make, not cmake.

* Conditional serialization test.

---------

Co-authored-by: David Cosgrove <david@cozchemix.co.uk>
2025-04-14 18:03:50 +02:00
Filip Chmielewski
37edf5b823 Check for PAINS_C file existence in update_pains.py script (#8431)
Co-authored-by: Filip Chmielewski <f.chmielewski@molecule.one>
2025-04-11 11:16:25 +02:00
Greg Landrum
7a9265b103 Fixes #8420 (#8428)
* Fixes #8420

* Update Code/GraphMol/Chirality.cpp

Co-authored-by: Ricardo Rodriguez <ricrogz@users.noreply.github.com>

---------

Co-authored-by: Ricardo Rodriguez <ricrogz@users.noreply.github.com>
2025-04-10 19:15:34 +02:00
Paolo Tosco
1b9d0921b1 - do not free sslib until patternFpArray is still needed by the test (#8407)
Co-authored-by: ptosco <paolo.tosco@novartis.com>
2025-04-09 13:33:23 +02:00
Ricardo Rodriguez
8c80e2d656 Enable the chiral flag on enumerated isomers (#8410)
* enable chiral flag, update tests

* fix bad merge

* move setting the chiral flag
2025-04-09 11:59:27 +02:00
Greg Landrum
8eb02b8bed switch to C++20 (#8039)
* c++20 builds working

* get MolStandardize building with clang19

* get FMCS building with clang-19

* set cxx version to c++20

* remove a few more compiler warnings

* bump min boost version, CI cleanup

* boost 1.81 is not available from conda-forge

* remove unused constants

* bump linux version for CI

* remove another unused variable

* fix (hopefully) cartridge CI builds

* simplify cartridge environment

* try postgresql14 in CI

* start the postgresql service

* change the columns used in the pandastools nbtest

* remove missed merge conflict artifact

* get github4823 test to pass with numpy 2.2

* remove a compiler warning/error with g++13
2025-04-09 11:57:17 +02:00
Greg Landrum
d3f75a115f bump gcc version in CI (#8419)
* bump gcc version in CI

* Update azure-pipelines.yml
2025-04-09 06:20:24 +02:00
David Cosgrove
d7e1ce7cf4 Rascal Fix 8360 (#8376)
* Use distances on all valid paths rather than just shortest distance.

* Optimise BondPaths.

* Optimise BondPaths.

* Hash coded for the bond paths.

* Faster find all paths.

* Build in gcc working.

* Comment.

* Remove debugging code.

* Update GettingStartedInPython.rst.

* Now need to split the clique and keep the largest fragment.
Lots of warnings about how slow this is.
Split out long tests.

* Back out a lot of changes.  Remove the distance check with singleLargestFrag when building modular product.

* Tidy code.
Update docstrings.
Add explanation to GettingStartedInPython.rst.

* Fix single fragment test.

* Response to review.

---------

Co-authored-by: David Cosgrove <david@cozchemix.co.uk>
2025-04-08 10:12:58 +02:00
Paolo Tosco
92c63eb98c Fix SynthonSpace build when RDK_USE_BOOST_SERIALIZATION is not defined (#8380)
* get SynthonSpace.cpp to build also when RDK_USE_BOOST_SERIALIZATION is
not defined

* test should not fail when RDK_USE_BOOST_SERIALIZATION is not defined
2025-04-08 09:47:48 +02:00
David Cosgrove
73382748bc Fix indexing of heavy atoms in PubChemShape (#8417)
* Fix indexing of heavy atoms.

* Increment j. Doh!

* Add include guard for the hpp.

---------

Co-authored-by: David Cosgrove <david@cozchemix.co.uk>
2025-04-08 09:41:46 +02:00
Paolo Tosco
d8dc968eaa Avoid a segfault in CoordGen when a double bond has stereo spec but no stereo atoms (#8415)
* avoid a segfault in CoordGen when a double bond has stereo spec but no stereo atoms

* changes in response to review

---------

Co-authored-by: ptosco <paolo.tosco@novartis.com>
2025-04-08 09:32:59 +02:00
Greg Landrum
5817e52db2 Fixes #3102 (#8413) 2025-04-07 09:01:16 -06:00
Dan Nealschneider
3facd882d6 Speed up GetProp Python keyerrors (#8372)
* Speed up GetProp Python keyerrors

A common pattern _in Python_ for checking for the presence or
absence of a key is:

    try:
       return mol.GetProp('mykey')
    except KeyError:
       return None

Shockingly, this is really slow with boost python objects! I was
recently profiling a workflow and 90% of the time or more was
spent in failed GetProp calls (mostly on bonds, some on atoms
or mols).

I sped up the workflow by protecting the calls using HasProp. But
I think this is a silly trap we've set for our users.

The problem comes because boost::python uses a C++ exception to
indicate that there is already a Python exception set. In C++,
exceptions are slow - they require unrolling a stack. In Python,
exceptions are about the same speed as any other control flow!

This commit speeds up GetProp failures by circumventing the
boost throw_exception_already_set() mechanism.

In my testing, this speeds up failed GetProp substantially:

* Factor of 1000x on Mac
* Factor of 40x on Linux

* Update typed GetXXXProp to bypass boost exceptions

Based on PR #8372

Updates the typed GetIntProp, GetDoubleProp, etc to bypass C++
exceptions in access. This speeds up missing key errors
significantly - for instance, calling mol. GetIntProp with a
missing prop 100,000 times:

Before: 28s
After:  0.05s
2025-04-07 13:55:22 +02:00
David Cosgrove
1604a8b738 Fix off-by-one error in Reduced Graph fingerprint bitstring. (#8385)
* Fix off-by-one error in bitstring.

* Response to review.

---------

Co-authored-by: David Cosgrove <david@cozchemix.co.uk>
2025-04-07 13:20:19 +02:00
Greg Landrum
71935ecae3 Fixes #8405 (#8406) 2025-04-04 15:13:39 +02:00
Greg Landrum
923483523e prep for next release cycle (#8402) 2025-04-03 05:25:12 +02:00
Paolo Tosco
030bd976de bump MaeParser version to 1.3.2 (enable building without boost::iostreams) (#8404)
Co-authored-by: ptosco <paolo.tosco@novartis.com>
2025-04-02 15:56:14 +02:00
Greg Landrum
e3b1ca015b Fixes github #8205 (#8368) 2025-04-01 19:40:41 +02:00
Greg Landrum
981360f55b add a missed RDKIT_SYNTHONSPACESEARCH_EXPORT (#8399)
this prevented some win64 DLL builds
2025-04-01 12:54:55 +02:00
Greg Landrum
d32c919066 prep for release (#8397) Release_2025_03_1 2025-03-31 20:11:10 +02:00
Greg Landrum
5c1fa74495 Fixes github #8364 and #8365 (#8366)
* Fixes #8365

* Fixes #8364
2025-03-31 15:58:47 +02:00
Greg Landrum
151254c5f8 [INFRA] bump min linux runtime in CI to ubuntu 22.04 (#8373)
* bump min linux runtime in CI to ubuntu 22.04

bump min boost version

some other minor cleanup

* boost version -> 1.82

* change postgresql install

* Update linux_build_cartridge.yml

* Update linux_build.yml

* Update linux_build.yml

* Update linux_build_cartridge.yml

* Update linux_build.yml

* Update linux_build.yml

* Update azure-pipelines.yml

* trigger CI run

* please build

---------

Co-authored-by: = <=>
2025-03-31 13:39:11 +02:00
Paolo Tosco
71c4103475 Suppress large amounts of 'BOOST_NO_CXX98_FUNCTION_BASE macro redefined' warnings in clang/emcc builds (#7747) 2025-03-29 20:04:58 +01:00
Paolo Tosco
a3cb75b769 Change size_t into std::uint64_t in SearchResults (#8392) 2025-03-29 18:22:31 +01:00
Greg Landrum
32608ae0b4 Atoms bonded to metal atoms should always have their H counts explicit in SMILES (#8318)
* refactor the code to determine whether or not an atom is in brackets

* move the definition of isMetal to QueryOps

* atoms bound to metals in SMILES should always be in square brackets

Implementation and some test updates

needs confirmation that all of tests run

* basic tests pass

* java tests pass

* update js tests

* doc updates

* Update Code/GraphMol/catch_graphmol.cpp

Co-authored-by: Ricardo Rodriguez <ricrogz@users.noreply.github.com>

* Update Code/GraphMol/SmilesParse/test.cpp

Co-authored-by: Ricardo Rodriguez <ricrogz@users.noreply.github.com>

* finish fixing tests

* bump yaehmop version to allow compilation to work

---------

Co-authored-by: Ricardo Rodriguez <ricrogz@users.noreply.github.com>
2025-03-29 07:26:03 +01:00
Ricardo Rodriguez
643b13cba4 switch to std::mt19937 (#8378) 2025-03-23 18:45:57 +01:00
Hussein Faara
6e5f27445a Ignore invalid chirality labels when reading MAE inputs (#8347)
* Ignore invalid chirality labels when reading MAE inputs

For a host of reasons, inputs chirality labels can be invalid as we try
to read an input MAE file or buffer. This can be annoying for situations
when we have other means of calculating the stereochemistry information,
so we should try to ignore bad labels and let the user deal with the
"bad" stereochemistry.

* remove stereo bond label changes

* Remove debugging code

* restore whitespace

* remove redundant &

* don't clear all stereo when we see invalid chirality labels

* use catch2 generators for test

* replace some requires with check
2025-03-22 05:12:14 +01:00
Hussein Faara
80f41eb262 Fix default wiggly bond writing behaviour in CXSMILES writer (#8348) (#8367)
* Fix default wiggly bond writing behaviour in CXSMILES writer (#8348)

When writing CXSMILES outpupts with the RestoreBondDirOptionClear
argument, which is enabled by default, we fail to write wiggly bond
information. I traced the issue to us clearing said information on the relevant
bonds during the pre-processing stage, so I fixed fixed the issue by
removing the that logic for wiggly bonds.

* address review comments
2025-03-22 05:07:51 +01:00
Greg Landrum
a9e79d35ad tag beta (#8369)
doing a self merge since this is just a release tag
Release_2025_03_1b1
2025-03-21 17:59:57 +01:00
Rachel Walker
7d02ac82fd Add RascalMCES option to require complete RingInfo rings (#8305)
* Attempt at completeRingsOnly in mces

* changed option name and added a test

* code review

* check aromaticity and bond type matches before checking ring equivalencies

* update if statement

* typo
2025-03-21 13:54:10 +01:00
David Cosgrove
620a16108d Synthon Search Phase 2 (#8338)
* Function for converting text to db file.

* Do search looping first on reactions, then on fragments.

* Add lowMem mode so reactions are only read from database as required.

* Move fragment fingerprint generation out of the inner loop.

* Put positions of SynthonSets directly in DB file so no need to read the file on initialisation.

* Update test binary file.

* Fix SynthonSpace.summarise() for new loMem mode.

* Extra bits in Python wrapper.

* Correct docstring.

* Compute pattern fingerprints ahead of search.

* Put Synthons into hitsets.

* First stage of re-factoring SynthonSpace.  Synthons are highly duplicated in the SynthonSets, so are held centrally in a pool in SynthonSpace and just the pointers kept in the SynthonSets.  The same Synthon, identified by SMILES string, can have multiple IDs in the SynthonSets so the ID is now held by the SynthonSet not the Synthon.

* Second stage - moved the synthon FPs into the Synthon as well.

* New binary file format.

* Tidying and fix because Synthons are shared across SynthonSets.

* Use shorter fingerprints for synthons.

* Don't exit with a bad file.

* Back out the fingeprint folding which made things worse.
Don't copy the synthon molecules into the hit sets, just take a pointer.
Put the fragments into the corresponding hit set, useful for debugging.

* Change way hit names are made to the manner preferred by Enamine.

* Only generate query connector regions once.

* Do some of the connector region checking by SMILES.

* Move where it gets the connector combinations so it's not done unnecessarily often.

* Fix tests.

* Don't make molecules for the connector combinations, a bitset is plenty.

* Make a pool of fragment fingerprints to reduce the number in total.
Use an upper bound on the Tanimoto Coeff to reduce need for full calculation.

* Fix splitMolecule, which wasn't producing all possible fragments.

* Take out old code.

* Back to using unique_ptr for fragments.
Abolish maxBondSplits option. Use the maximum number of synthons in the space to control the splitting.

* Don't fold the reaction connector region fps into 1.

* Streamline connector combinations in substructure search.

* Re-factor fragment fingerprint generation prior to multi-threading.

* Make checkConnectorRegions return false when it should.
Tweak AllProbeBitsMatch.cpp.

* Fix Python wrapper of text file reader.

* More complex query shenanigans - amino acid this time.

* More complex query shenanigans - amino acid this time.

* Tidy.

* Fix binary DB read bug.
New Idorsia space file.

* Correct/improve function documentation.

* Tidying up.

* Remove stray include.

* Fix CI Tests.

* Plug memory leak.
Revise python timeout test.

* Simplify way synthon searchMols are created.  Previous method gave incorrect results sometimes hence new test.

* Update idorsia space file.

* Update idorsia test result.

* Update idorsia test result.

* Changes after first review.

* Move getFormattedNumProducts to general function.

* Stash working version with maps and mutex.

* Working with sorted vectors rather than maps.
Reading Text DB presumably slow.

* Split out MemoryMappedFileReader.cpp.

* Fix ReadDBFile in Python wrapper.

* Streamline tests.

* Include filesystem.

* Replace many uses of std::map with sorted std::vector.

* Use more auto.

* Threaded build hits.

* Threaded search.

* Don't chunk threaded buildAllHits.

* Allow for different results in random sampling.

* Threaded splitMolecule.
Fix bug - apply removeQueryAtoms to all frags, not just one per unique SMILES.
Do largest fragment heuristic up front so as not to repeat on each thread.

* Streamline Python tests.

* Separate out time-consuming tests.

* Add Rascal similarity searching.

* Add extended queries.

* Make extended queries honour maxHits correctly.

* Extra extended query test.

* Hide really long tests on local files.

* Remove local test.

* Make random tests less strict.
Attempt to fix build issues.

* Attempt to fix build issues.

* Response to review.

* Fix no-threads version.

* Re-move re-formatting.

* Add move semantics to MemoryMappedFileReader.

* Move c'tor needs size as well.

---------

Co-authored-by: David Cosgrove <david@cozchemix.co.uk>
2025-03-21 13:09:34 +01:00
Greg Landrum
f3b7fd0b59 Fixes #8351 (#8363) 2025-03-20 07:03:45 -04:00
Greg Landrum
8ea8ec5e3f Fixes #7983 (#8342)
* Fixes #7983

move the call to cleanupAtropisomerStereoGroups() into assignStereochemistry()

* Additional tests from @susanhleung in #8323

* more testing

* changes in response to review

* changes for review
2025-03-20 07:40:33 +01:00
Greg Landrum
33fda86856 Fixes #8308 (#8335)
* Fixes #8308

* fix typo
2025-03-19 12:00:39 +01:00
David Cosgrove
9b058c263b Github8353 (#8354)
* Add PRECONDITION for attempt to expand empty query.

* Improved test, maybe.

---------

Co-authored-by: David Cosgrove <david@cozchemix.co.uk>
2025-03-17 19:54:15 +01:00
Greg Landrum
35c8c54a3a cartridge: expose sanitize options to mol_from_ctab (#8326)
* add sanitize and removeHs options to mol_from_ctab

* bump version to 4.7.0
add update script
fix a bug in the 4.4.0 - 4.5.0 update script

* document the new arguments

Should add argument names to all cartridge functions in a future PR

* fix a mistake

* response to review

* response to review

---------

Co-authored-by: Greg Landrum <glandrum@ethz.ch>
2025-03-11 06:14:44 +01:00
Greg Landrum
fe5ccb7d47 Feat/use draw color in drawString() (#8334)
* change exception type for bad user input in the MCH code

* use the current draw color with drawText
2025-03-08 10:46:39 +01:00
Brian Kelley
cad6962a37 Adds linker style zipping (#8289)
* Adds linker style zipping

* Fix dangling dummy atoms, i.e. [*:1].C

* Run clang-format

---------

Co-authored-by: Brian Kelley <bkelley@glysade.com>
2025-03-06 05:44:25 +01:00
Greg Landrum
28db5d706c Revert "Fixes #5134 (#5136)" (#8319)
This reverts commit 8691c58055.
2025-03-05 18:11:24 +01:00
Greg Landrum
b40f99eee3 Fixes #8304 (#8316)
* first pass, does not pass all tests

* add an option to control the new behavior

* add that to the python wrapper too

Fixes #8304

* Update Code/GraphMol/MolOps.h

Co-authored-by: Ricardo Rodriguez <ricrogz@users.noreply.github.com>

* undo some extra comment reformatting

* typo

Co-authored-by: Ricardo Rodriguez <ricrogz@users.noreply.github.com>

---------

Co-authored-by: Ricardo Rodriguez <ricrogz@users.noreply.github.com>
2025-03-04 14:56:08 +01:00
Ricardo Rodriguez
c8d55d41dc amend declaration (#8302) 2025-03-04 14:03:13 +01:00
Greg Landrum
b64aea5082 apply query adjustments when makeAtomsGeneric is enabled (#8315)
* apply query adjustments when makeAtomsGeneric is enabled

* changes in response to review

* I got Python in my C++
2025-03-04 13:50:24 +01:00
Greg Landrum
1727ffd538 Fix the way carbon is handed in cleanupOrganometallics() (#8301)
* allow cleanupOrganometallics to work with carbon

* do not use cleanupOrganometallics with mol2

* fix handling of C atoms in cleanupOrganometallics

* add test for #8312

* Update Code/GraphMol/MolOps.cpp

Co-authored-by: Paolo Tosco <paolo.tosco.mail@gmail.com>

---------

Co-authored-by: Paolo Tosco <paolo.tosco.mail@gmail.com>
2025-03-03 17:30:41 +01:00
Greg Landrum
144c1f29f1 bump to InChI 1.07.3 (#8300) 2025-02-26 15:23:19 +01:00
Ricardo Rodriguez
12929849f5 Make MaeWriter to throw on errors. (#8297)
* allow MaeWriter to throw

* update tests

* throw ValueErrorException

* add note on backward incompatible change of behavior
2025-02-25 16:59:40 +01:00
Gareth Jones
aa22040195 Fix memory leaks (#8298) 2025-02-25 16:58:19 +01:00
David Cosgrove
81cc5bc704 Option for standard atom colours with highlighting (#8294)
* Option to use standard atom colours under highlights.

* Update hash codes for catch_tests.cpp.

* Update hash codes for test1.cpp and rxn_test1.cpp.

---------

Co-authored-by: David Cosgrove <david@cozchemix.co.uk>
2025-02-24 17:46:24 +01:00
tadhurst-cdd
2dbf83898d trimethylcyclohexane chirality error (#8272)
* Fix for trimethylcyclohexane error

* removed unused variable

* removed debugging code
2025-02-24 17:41:54 +01:00
Ricardo Rodriguez
55dc5c73e3 fix inchi conversion leak 1 (#8291) 2025-02-24 17:14:38 +01:00
Greg Landrum
cfffa8db4d try adding a CI job to build the minimallib docker image (#8290)
* try adding a CI job to build the minimallib docker image

* update

* try simplification

* path

* path

* re-enable the other ci builds
remove the obsolete .yml file
2025-02-21 09:44:31 +01:00