* run clang-tidy with readability-braces-around-statements
clang-format the results
clean up all the parts that clang-tidy-8 broke
* fix problem on windows
* copy in, get building, add some basic tests
* complete the testing
Except for regiosiomers, which do not work
* regioisomers work now
* backup commit; things work
* remove last of NM macros from hashfunctions.cpp
* remove last of NM macros from hashfunctions.cpp
* remove dependency on the abstraction layer
* typo
* start using namespaces
clang-format
* switch to using enums for the HashFunctions and StripTypes
* Add initial python wrapper (and tests)
* move the new hashing code to the MolHash library
still may want to revise the naming of this
* Setup deprecation of the older hashing code
* better release notes text
* change in response to review
* add the ring decomposer lib (temporarily?)
* simplify makefile
* very basics work
* backup
* basics working
* builds and basic tests pass
* get this building again
* expose the ring families
* add tests on the python side
* make the pywrapper for this optional
* remove some extra bits
* cleanup
* switch to using RDL as an external project
* make sure this still works if we do not use the URF code
* remove BUILD_ALWAYS
* fix linkage of Java wrapper and cartridge (hopefully)
* fix cmake for wrappers (hopefully)
* forgot a semicolon
* try to force URF lib to build first
* improve memory management and interface
* fix dependency specifier
* make pointer initialization explicit
This may not be necessary, but it feels safer.
* not pleasing and needs to be cleaned up
but it builds
* not pleasing and needs to be cleaned up
but it builds
* cleanup in preparation for merging
* cleanup in preparation for merging
* switch to rareylab repo
* fix updated copyright date
* Fix updated copyright date
* switch to a specific library tag
Co-Authored-By: Florian Flachsenberg <flachsenberg@zbh.uni-hamburg.de>
* change in response to review
* first round of cleanups based on PVS-studio suggestions
* a couple more
* a few more cleanups
* another round of cleanups
* undo one of those cleanups
we want the integer rounding behavior here
* add a comment to make that clear
* Fix for filter catalog PRECONDITION redundancy
* Issue #2403: Speed up SSSR symmetrization
For my horrible example molecule (a highly symmetric
nanotube with 2400 atoms and > 1000 rings), this speeds up
symmetrizeSSSR() from 5s to about 0.002s. findSSSR() takes
another .4s or so.
* Refactor after Ricardo's suggestions
* Greg's review comments. use std::vector
* Removes ATOM/BOND_SPTR in boost::graph in favor of raw pointers
* Actually delete atoms and bonds...
* RWMol::clear now calls destroy to handle atom/bond deletion
* Changes broken Atom lookup for windows/gcc
* Adds tests for running with valgrind
* Adds test designed for valgrind and molecule deletions
* Removes RNG, actually tests bond deletions
* update swig wrappers
* deal with most recent changes on the main branch
This actually just causes the molecule processing to fail in a reasonable amount of time; it is not an actual fix to the underlying ring-finding problem
* findSSSR performance improvements for fragments without rings
This makes Chem.SanitizeMol significantly faster when dealing with
molecules with lots of disconnected fragments (like a box of water).
The following is the runtime of Chem.SanitizeMol while adding 10,000
waters with explicit hydrogens when running Chem.SanitizeMol on every
1,000th water added.
Before:
0 add_water = 0.00007s
0 Chem.SanitizeMol = 0.01991s
1000 add_water = 0.00009s
1000 Chem.SanitizeMol = 0.99659s
2000 add_water = 0.00013s
2000 Chem.SanitizeMol = 3.94565s
3000 add_water = 0.00018s
3000 Chem.SanitizeMol = 8.94760s
4000 add_water = 0.00023s
4000 Chem.SanitizeMol = 15.75187s
5000 add_water = 0.00035s
5000 Chem.SanitizeMol = 24.59318s
6000 add_water = 0.00048s
6000 Chem.SanitizeMol = 37.23530s
7000 add_water = 0.00042s
7000 Chem.SanitizeMol = 47.70860s
8000 add_water = 0.00105s
8000 Chem.SanitizeMol = 62.21912s
9000 add_water = 0.00056s
9000 Chem.SanitizeMol = 80.08511s
After:
0 add_water = 0.00003s
0 Chem.SanitizeMol = 0.01219s
1000 add_water = 0.00004s
1000 Chem.SanitizeMol = 0.01004s
2000 add_water = 0.00012s
2000 Chem.SanitizeMol = 0.01058s
3000 add_water = 0.00018s
3000 Chem.SanitizeMol = 0.01158s
4000 add_water = 0.00018s
4000 Chem.SanitizeMol = 0.01530s
5000 add_water = 0.00022s
5000 Chem.SanitizeMol = 0.02010s
6000 add_water = 0.00036s
6000 Chem.SanitizeMol = 0.02397s
7000 add_water = 0.00033s
7000 Chem.SanitizeMol = 0.02978s
8000 add_water = 0.00037s
8000 Chem.SanitizeMol = 0.04446s
9000 add_water = 0.00040s
9000 Chem.SanitizeMol = 0.04419s
* Refactor new_timings.py script a bit to be able to run only the first (reading molecules) test.
* Removing O(N^2) behavior of finding the number of bonds in the fragment during SSSR.
This only improves the case when there are long chains and a small
number of rings in the fragment. Many ring systems are still dominated
by the rest of the SSSR algorithm, and fragments with no ring systems
don't reach this part of the code.
For a test case with a single cyclicpropane and adding carbons while
calling Chem.SanitizeMol every 10,000 carbons added yield the
following improvement in performance:
before:
0 add_carbon = 0.00001s
0 Chem.SanitizeMol = 0.01237s
10000 add_carbon = 0.00017s
10000 Chem.SanitizeMol = 0.04453s
20000 add_carbon = 0.00017s
20000 Chem.SanitizeMol = 0.13038s
30000 add_carbon = 0.00029s
30000 Chem.SanitizeMol = 0.27671s
40000 add_carbon = 0.00063s
40000 Chem.SanitizeMol = 0.44774s
50000 add_carbon = 0.00106s
50000 Chem.SanitizeMol = 0.69433s
60000 add_carbon = 0.00181s
60000 Chem.SanitizeMol = 1.00577s
after:
0 add_carbon = 0.00001s
0 Chem.SanitizeMol = 0.01264s
10000 add_carbon = 0.00013s
10000 Chem.SanitizeMol = 0.01349s
20000 add_carbon = 0.00022s
20000 Chem.SanitizeMol = 0.02724s
30000 add_carbon = 0.00040s
30000 Chem.SanitizeMol = 0.04292s
40000 add_carbon = 0.00076s
40000 Chem.SanitizeMol = 0.06172s
50000 add_carbon = 0.00193s
50000 Chem.SanitizeMol = 0.07658s
60000 add_carbon = 0.00147s
60000 Chem.SanitizeMol = 0.08625s
Note, couldn't actually test a higher number of carbons as it led to a
stack overflow due to recursion in findSSSR.
Story: I have a PDB I want to read into RDKit. It has a disulfide bond
between two cysteines ~400 residues apart. This creates a very large
ring. RDKit throws an error because the number of found rings is less
than the expected number of rings. The ring wasn't found because RDKit
thought all "smallest" rings should be 256 or smaller.
Now, as long as your ring is UINT_MAX aka 4,294,967,295 or smaller, life
is beautiful. I hope no one has a ring bigger than 4 billion atoms.
o rdkit gains a RDKit::common_properties namespace that contains common string value properties
o Dict.h and below gain getPropIfPresent that attempts to retrieve a property and returns
true/false on success or failure. This is used to optimize access.
o rdkit learns how to pass property keys by reference, not value.
A new namespace has been added to RDKit, common_properties
that contains the std::string values for commonly used
properties. This helps to avoid typos in string values
but also avoids a creation of std::strings from character
values. All accessors (has/get/clear and getPropIfPresent) now pass
the key by reference.
Additionally, getPropIfPresent removes the double lookup
of hasProp/getProp which can be a significant speedup
in the smiles and smarts parsers (10-20%)
The original version using products of primes broke down for proteins (or things with
large numbers of atoms) since collisions were quite possible.
Switched to using a hashing scheme; collisions are also possible here, but are even
less likely.