* CIPLabeler performance: Store vector of bonds
CIPLabelling refers to bonds by index over and over again. This
causes a measurable hit in performance in findConfigs() because
we iterate over a bitset of "allowed" bonds. For very large
molecules with many bonds, this can be a rate-limiting step!
This affects many PDB-sized structures.
2J3N goes from 0.7s to 0.25s with this change.
I had another example for which the findBondWithIdx() call was
taking 500ms of a 700ms call (after the performance update
in #9171 was implemented)
* yikes, XXL reserve
thanks, greg
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
---------
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* CIP labeller: Don't calculate auxiliary descriptors unnecessarily
The first 3 rules (the constitutional rules) are pretty easy
to understand. After rule 3, we need to calculate auxiliary
stereo descriptors to break ties.
However, we _were actually_ calculating auxiliary stereodescriptors
for all centers! We should only need to calculate auxiliary
stereocenters for sites that are needed to break ties.
This cost time - it also caused errors if the auxiliary descriptors
needed a graph expansion, because bonds in the digraph might be
pointed in the wrong direction.
Example case PDB ID 4AXM
Before this commit, errored with "Could not calculate parity! Carrier mismatch"
after 14s. After this commit, completes successfully in 0.036s.
Labelled centers all match (for the centers that had labels in
the failure case).
Includes a test that I can imagine breaking with this optimization.
The reference labels are from before this change
* Ensure all "arms" of stereo bonds and atropisomer bonds are expanded
For tetrahedral centers, ranking using the constitutional rules
always expands as far as is needed (but no further). For SP2bond
and atropisomers, if the first side is not resolvable, the
second side is never visited.
If the constitutional rules don't resolve a side, we need to
label the auxiliary centers. It's important to label all
auxiliary centers that _will_ be visited, so we need to know
what centers will be visited.
This commit updates the label() call in SP2 and Atropisomer
bonds to always attempt to label both sides if using the
constitutional rule set.
The constitutional rules are cheap, and if they fail, we
always go on to the full rule set. It is not a savings to skip
the search on the second side if we're going to keep going
anyway!
Includes a test that reproduces Ricardo's example.
This has no measurable effect on performance relative to the
original solution
* If any parts of the center have been seen, label it.
I couldn't make an example hit this, but Ric is totally
theoretically right
* Greg's ranges suggestion #2
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* any_of for container search
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
---------
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* add the _CIPNeighborRanks property
* store CIP-ranked chiral neighbors
* store CIP-ranked SP2 bond and atropisomer anchors
* add a test
* boost headers in test
* add Atom::NOATOM
* add NOATOM test
* amend and clarify implicit H in Tetrahedral
* rename property
* rename property to _CIPNeighborOrder
* deprecate Chirality::StereoInfo::NOATOM
* Fixes#8379
* check in some working tests
* test passes
* test passes
* test passes
* test passes
* test passes
* ensure that the invariants flush the streams on failure
* tests pass
* test passes
* tests pass
* tests pass
* tests pass
* tests pass
* tests pass
* tests pass
* tests pass
* tests pass
* tests pass
* tests pass
* tests pass
* tests pass
* tests pass
* tests pass
* Fixes#8391
* tests pass
* fix a test with legacy
not clear why this was not causing problems before
* make a test work
* Fixes#8396
* gcc builds work
* fingerprint tests pass
* mention backwards incompatible change
* fix a problem with FindMolChiralCenters
* more testing details
* enable the test status output
* Fixes#8432
fix a bug in double-bond stereo handling for template matching
* all depictor tests pass
* use the new-stereo chiral ranks in the depiction code
* always assign new-stereo chiral ranks
* make _ChiralAtomRank a computed property
This is analogous to _CIPRank
* tweak to the way the atom ordering is computed for 2D coordinate generation
* update two expected results
* backup
* response to review
* tests pass
* tests pass
---------
Co-authored-by: = <=>
* fixleak in CIP labels catch test
* fix leak in Murtagh clustering
* do not leak writers in streambuf
* fix leaks in fingerprintgeneratorwrapper
* remove 'minor leak' comments
* atropisomer handling added
* fixed non-used variables, linking directives
* BOOST LIB start/stop fixes, linking fix
* Fixes for RDKIT CI errors
* minimalLib fix
* changed vector<enum> for java builds
* check for extra chars in CIP labeling
* removed wrong deprecated message
* fix ostrstream output error?
* restored _ChiralAtomRank to lowercase first letter
* changes for merged master
* Fixed catch label for new Catch package
* update expected psql results
* get swig wrappers building
* restore MolFileStereochem to FileParsers
* fix java wrapper for reapplyMolBlockWedging
* test changes
* some suggestions
* move a couple functions out of Bond
* Merge branch 'master' into pr/atropisomers2
* merged master
* Renamed setStereoanyFromSquiggleBond
* atropisomers in cdxml, rationalize atrop wedging, stereoGroups in drawMol
* Merge branch 'master' into pr/specialQueries
* changes from previous PR
* Iclude false chiral
* rigorous enhnced stereo canoncalization
* Added more tests and clenup
* removed commented out code
* corrected init of SmilesWriteParams
* added MolFileStereoChem.h to the header files
* Renamed Rxn parser to MrvBlockToChemicalReaction
* To make catch2 work, and match the checksum
* Fixed Structchecker errors
* fix CI for DetermineBonds catch test
* error in catch_test for CI
* Allow custom smileWriteParams in GetMolLayers
* misnamed entry point
* ReactionFromMrvString change name
* remove adding writeParams to GetMolLayers
* make rigorous enhanced stereo the default, and fix tests
* only one abs group no longer needs Rigorous Enhanced treatment
* changed string_view to string in catch test
* Canonicalize Enhnaced Stereo only resturne unique smiles
* Now allows or and and groups together
* internal routines inside detail scope
* fix test error
* changed string back to string_view and fixed a CHECK
* Fixes for PR review tests
* Fix RDKit_Book.rst failure on build test
* fix xqm sql test
* updated expected files for cxsmiles_test
* Fixed removal of atom attrs
* Fixed tests after merge of master
* More efficient version of Stereo Groups Canonicalization
* Fixes for ctests
* removed debug code
* readded cipLabel test
* fix generalizedSubstruct/catch_tests.cpp error
* hueristics to improve speed
* Rationaized control of abs groups
* removed unused routine
* added rigorous stereo group treatment to test
* some suggested changes
* Changes per PR review and removed some changes to smiles
* Fixed CI errors
* changes per PR review
* more PR review vhanges and cleanup
* Fixed PSql PKL change
* changes as per PR review
* Restored error type for bad mols for canonicalizeStereoGroups and added a test
* Merge master and fix test in MolDraw2D
* Fix for randomize test error and other PR review comments
* Removed unsued variable to fix mac CI
* do not force aromatization in canonicalizeStereoGroups
* changes as per PR review
---------
Co-authored-by: greg landrum <greg.landrum@gmail.com>
For example, PDB ID 2ZP8 goes from 133s to 53s on my M3 laptop.
Boost graph edges (our bonds) cannot be accessed by index, they
must be searched for - linearly. This protein only has 27,000
bonds (after addition of explicit hydrogens), but that's still
a lot.
* Speed up CIPLabeler
std::list is almost never the data structure that you want. vector
uses fewer allocations, and the data has better locality. This
speeds up the CIPLabeler on some protein examples by about 2x.
* Greg's recommendation to use std::vector
It's just as fast. For my example, this made queue sizes of like
1,186,390. Reserving ahead doesn't affect performance.
* changed string_view to string in catch test
* fix for ring-stereo atropisomers
* removed debug cout
* removed change to build in debug mode
* change catch arg to a rerference
* changes as per PR review
* removed string_view in favor of string for catch test
* add parsing and generation of atropisomers when coords not present
* changed string_view to string in catch test
* more docs
* reformulation of the docs
* make an error message a little bit more useful
* small optimization
clang-format
* add `BondWedgingParameters` to new function
* changes for CIP test errors
* Updated internal doc to match what it does
* changes per PR review
* removed cout statements in tests
---------
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* change a define to a "using"
* basic atropiosomer support for confgen
currently only supports bonds with two exo substituents on each end
* support systems with two exo bonds
* cleanup
* add test for #7109Fixes#7109
* clean the tests up a bit
* New tests for speical query atoms and atropisomers
* fixed error, and used unique_ptrs
* Removed test that makes GraphMol depend on GenericGroups
* More to remove GraphMol dependency on GenericGroups
* atropisomer handling added
* fixed non-used variables, linking directives
* BOOST LIB start/stop fixes, linking fix
* Fixes for RDKIT CI errors
* minimalLib fix
* changed vector<enum> for java builds
* check for extra chars in CIP labeling
* removed wrong deprecated message
* fix ostrstream output error?
* restored _ChiralAtomRank to lowercase first letter
* changes for merged master
* Fixed catch label for new Catch package
* update expected psql results
* get swig wrappers building
* restore MolFileStereochem to FileParsers
* fix java wrapper for reapplyMolBlockWedging
* some suggestions
* move a couple functions out of Bond
* Merge branch 'master' into pr/atropisomers2
* merged master
* Renamed setStereoanyFromSquiggleBond
* atropisomers in cdxml, rationalize atrop wedging, stereoGroups in drawMol
* fix for CI build
* attempt to fix java build in CI
* attempt to fix java build in CI #2
* New routine to remove non-explicit 3D-geneated chirality
* changed to use pair for atrop atoms and related bonds
* Changes as per PR reviews
* PR review respnses
* PR review reponse - more
* Fix merge from master
* fixing java ci after merge
* Updated the help doc for atripisomers
* update the atropisomer docs
* improve the images
* add the source CXSMILES
---------
Co-authored-by: greg landrum <greg.landrum@gmail.com>
* Fixes#4996
also switches to using the GraphMol version of catch_main.cpp so builds are faster
* Fixes#4998
we should probably discuss this one
* compare with previous results
* add new test (it fails, of course)
* isAtomPotentialTetrahedralCenter() there and tested
tests cases for molecular stereo written (but failing, of course)
create new_chirality.cpp, we will probably want to change this at some point
new StereoInfo structure
* more infrastructure
- isBondPotentialStereoBond()
- two getStereoInfo() functions
- associated unit tests
* backup
* oops
* backup
* switch to always using four atoms for bonds
* backup
* add new test (it fails, of course)
* isAtomPotentialTetrahedralCenter() there and tested
tests cases for molecular stereo written (but failing, of course)
create new_chirality.cpp, we will probably want to change this at some point
new StereoInfo structure
* more infrastructure
- isBondPotentialStereoBond()
- two getStereoInfo() functions
- associated unit tests
* backup
* oops
* backup
* switch to always using four atoms for bonds
* backup
* this now actually works
* doc update
* add a test to demo that ring stereo is not working
* more testing
* add a fun CIP test
* add review note
* debugging
* remove extraneous debugging
turn off tests for ring-double bond stereo
* disable the ring-stereo fix... this breaks a few tests, but we will recover
* works, needs cleanup, chirality code needs re-testing
* nothing works
* Fixes#3322
* Python and C++ tests now pass
* clang-format
* first pass at python wrappers
* improve doctest
* basic optimization...
stop with the copying
* rename
* all tests passing again
* optimization
* fix the sort in the tests
* looks like this might fix the windows-dll build problems
* update tests
* the fun never ends
* comment cleanup
* handle deliberately unspecified atoms/bonds
* add cleanIt option
* add flagPossible
* add option to use the new code to the SMILES parser
* additional testing
* additional testing
* a bit of additional testing never hurts
* changes in response to review
* fixes a bug with potential parastereo not being cleared
other changes in response to review
* update docs
* add port of centres
* Several changes:
- Added a test based on RDKit issue 2984
(default RDKit fails it, this gets it right)
- Use bond directions for bond stereo (label is no longer required)
- Fix bugs in rules 4b and 5new
- Fix some mem errors
- clang-formatted
- some other minor cleanups
* Several changes and some improvements:
- Added LGPL license, as well as a mention in the doc.
- Fix/update/add some comments
- Fix typo/bug in Mancude calculation
- Fix bug in rules 4b, 5New
- Fix Sp2 Bond dir reference
- Re clang-format
- other minor changes suggested by Dan
* Another bunch of changes:
- require integer-order bonds; kekulize when required
- fix fraction comparison
- rename sq Cis/Trans e/z
- replace queues with vectors
- update copyright notices
- revert LGPL changes
- fix Asymmetric typo
* move to separate lib/mod, add python validation test
* Moving away from the original implementation:
- Rename to CIPLabeler
- Remove the abstraction layer
- Remove some stats stuff
- Push some CIPMol functions down to Node
- Use RDKit's isotope info
* Another bundle of changes. The most relevant ones:
- fix parity translation
- use cis trans as bond reference -- breaks #2984 test
- kill a lot of unused code
- use lists for queues
- store nodes and edges in digraph
- add prefixes to class data member names
- update changeRoot() test
- use fastFindRings() for mancude rings
- update docs
- add references to the scientific paper
- Document the Mancude functions
- Fix Mancude atom types and their comments
- remove mol data member from SequenceRule
- replace Fraction with boost::rational
- update comments, docstrings and the doc
* fix building the test
* Changes here include:
- adding bitset overload for the labeling function
- python wrap of the overload
- handling trigonal pyramids with implicit H
- setting bond labels sets stereo atoms, cis/trans
- nix LEFT/RIGHT/TOGETHER/OPPOSITE constants
- don't use GLOB in cmake
- a decent amount of refactoring
* Minor edits to new_CIP_labeling (#6)
* Some changes for clarity
Added some documentation and changed some variable names to match
my understanding. Also a ran clang-tidy to ensure that all blocks
were brace-enclosed.
* Return a reference instead of a copy for performance
This is called many times and showed up after some light
profiling. This change bumped throughput by about 20%
* move out of Graphmol
* move .hpp headers to .h
* update documentation; add label set of atoms test
* Address comments:
- Added references to centres to CIPLabeler.h and Python Wrap.
- Update validation test to skip sanitization.
- Document mancude fractional atomic number calculation.
- Use unittest assertions in python test.
- Update mancude docstrings to 'resonance' instad of 'tautomers'.
- Rename prioritise() to prioritize().
- Add postcondition to check carriers size in Tetrahedral.cpp.
- Use getNeighbors() in Tetrahedral.cpp.
- Move findStereoAtoms to Chirality namespace.
- Move code back into GraphMol.
- Fix typos and reformat doc.
* More comments:
- Mention why we use boost's unordered map rather than the std one.
- Fix include in Python wrapper.
* Addressed second batch of comments:
- fix the bug in rule 4b
- fix docstring for rule 2
- move atomic mass calculation from rule 2 to node
- addressed some build warnings
- simplify sp2bond::label(comp)
- add start/end atoms to Sp2Bond constructor
- update system/local includes
Co-authored-by: Dan N <dan.nealschneider@schrodinger.com>