* Defer numpy initialization to first use in rdchem, rdmolops, cDataStructs
`from rdkit import Chem` unconditionally bootstrapped numpy (~120ms) via
import_array()/boost::python::numpy::initialize() in module init functions,
even when no numpy-dependent APIs were called. This is costly in cold-start
environments like AWS Lambda.
Move numpy initialization behind lazy guards (static bool + first-call init)
in rdchem.so, rdmolops.so, and cDataStructs.so. Numpy now loads only when
an API that actually needs it is invoked (GetDistanceMatrix, GetPositions,
SetPositions, GetAdjacencyMatrix, ConvertToNumpyArray, etc.).
Also change Conformer::SetPos to accept python::object instead of
np::ndarray to prevent Boost.Python from requiring numpy type conversion
before the lazy guard runs.
Adds test_lazy_numpy.py with subprocess-based tests verifying:
- `from rdkit import Chem` does not load numpy
- SmilesToMol/MolToSmiles work without numpy
- numpy loads on demand when array APIs are called
* skip inchi tests if not available
* switch to threadsafe once_flag, like elsewhere
* finish ifdef style
* switch to magic static style
* Revert "switch to magic static style"
This reverts commit 7300188db7.
* Trim spaces from RDProp strings to simulate reading from SDFiles
* Update documentation
* Use the correct doc strings
---------
Co-authored-by: Brian Kelley <bkelley@glysade.com>
* Speed up GetProp Python keyerrors
A common pattern _in Python_ for checking for the presence or
absence of a key is:
try:
return mol.GetProp('mykey')
except KeyError:
return None
Shockingly, this is really slow with boost python objects! I was
recently profiling a workflow and 90% of the time or more was
spent in failed GetProp calls (mostly on bonds, some on atoms
or mols).
I sped up the workflow by protecting the calls using HasProp. But
I think this is a silly trap we've set for our users.
The problem comes because boost::python uses a C++ exception to
indicate that there is already a Python exception set. In C++,
exceptions are slow - they require unrolling a stack. In Python,
exceptions are about the same speed as any other control flow!
This commit speeds up GetProp failures by circumventing the
boost throw_exception_already_set() mechanism.
In my testing, this speeds up failed GetProp substantially:
* Factor of 1000x on Mac
* Factor of 40x on Linux
* Update typed GetXXXProp to bypass boost exceptions
Based on PR #8372
Updates the typed GetIntProp, GetDoubleProp, etc to bypass C++
exceptions in access. This speeds up missing key errors
significantly - for instance, calling mol. GetIntProp with a
missing prop 100,000 times:
Before: 28s
After: 0.05s
* fix SetPositions when using strided numpy array
previously SetPositions assumed that the provided numpy array used contiguous-C stride patterns
* cast to a const pointer to avoid compiler warning
with cmake --build . --target stubs (also make stubs on *NIX)
- improved the patching script to do a better assignment of
overloaded constructor parameters, whcih results in a number
of docstring fixes
Co-authored-by: ptosco <paolo.tosco@novartis.com>
* - added gen_rdkit_stubs Python module to generate rdkit-stubs
- added patch_rdkit_docstrings Python module to patch existing C++ sources to fix docstrings missing self parameter and add named parameters taken from C++ signatures where possible
- added rdkit-stubs/CMakeLists.txt to build rdkit-stubs as part of the RDKit build
- added an option to CMakeLists.txt to enable building rdkit-stubs as part of the RDKit build (defaults to OFF)
* fixed CMakeLists.txt, rdkit-stubs/CMakeLists.txt and a doctest
* - added missing cmp_func parameter
- fixed case with overloads with optional parameters
- do not trim params if expected_param_count == -1
- add dummy parameter names if we could not find any
- keep into account member functions when making up parameter names
- address __init__ and make_constructor __init__ functions
- fix incorrectly assigned staticmethods
* patched sources
* address residual few remarks
---------
Co-authored-by: ptosco <paolo.tosco@novartis.com>
* Optimize GetPropsFromDict, use tags for conversion, not the try and fail technique
* Autoconvert strings to ints and bools if possible
* Add autoConvert option to GetPropsAsDict default=true
* Update Code/GraphMol/Wrap/props.hpp
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
* change autoConvert to autoConvertStrings, add failed datatype conversion notices
* Fix Invariant usage
* Fix namespace for string
* Add GetProp(key, autoConvert) to allow for converting only what you want
* Make TestSetProps private
* Get _TestSetProps from rdmolops
---------
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
Co-authored-by: Brian Kelley <bkelley@relaytx.com>
* run clang-tidy with readability-braces-around-statements
clang-format the results
clean up all the parts that clang-tidy-8 broke
* fix problem on windows
* Fixes#2450
adds `hasOwningMol()` to the public API for Bond and Atom
* add to other classes;
add to python wrappers
* get the smiles generation partially working
* changes in response to review
* Conformer GetPos returns a numpy array rather than a tuple of tuples
* Add missing method binding to GetPositions for a conformer
* Minor fixes
- avoid copying the coordinates,
- add a test