mirror of
https://github.com/pybind/pybind11.git
synced 2026-06-04 13:34:24 +08:00
* feat(subinterpreter): add opt-in TLS-cached thread state mode subinterpreter_scoped_activate previously created and destroyed a fresh PyThreadState on every activation when the calling OS thread was not already running the target interpreter. Workloads that repeatedly re-enter the same sub-interpreter from the same thread therefore churn thread states and lose per-thread interpreter state between activations (see pybind/pybind11#6040). Add an opt-in subinterpreter_thread_state::cached policy: on first use a PyThreadState is created and stored in OS-thread-local storage keyed by the target interpreter; subsequent activations on that thread only swap it in/out and never destroy it. The default stays transient, so existing behavior is unchanged. Since pybind11 does not control thread lifetime, cleanup is explicit: subinterpreter::release_cached_thread_state() releases the calling thread's cached state for one interpreter, and the static release_all_cached_thread_states() releases all of the calling thread's cached states as an end-of-thread hook. The TLS map's destructor only frees its own nodes and never touches the Python C API, so an unreleased state leaks rather than crashing at thread exit. Includes test coverage and embedding docs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * style: pre-commit fixes * refactor(subinterpreter): replace cached enum/TLS with subinterpreter_thread_state RAII Address review feedback on the original "cached" mode by switching to an explicit two-RAII design suggested by @b-pass: "Create a class ... to RAII-manage the PyThreadState but start its lifetime in an already released state. You could create another class (or modify scoped_activate) to scoped/RAII activate the inactive threadstate." Removed - enum subinterpreter_thread_state { transient, cached } and the defaulted ctor parameter on subinterpreter_scoped_activate. - detail::subinterpreter_thread_state_cache thread_local map. - subinterpreter::release_cached_thread_state() and subinterpreter::release_all_cached_thread_states(). This eliminates: the hidden per-thread map, the "release_all" footgun across pybind11 modules (the cache was module-local), and the implicit "must not be active when called" contract on the release functions. Added - Public class subinterpreter_thread_state that owns one PyThreadState for a given subinterpreter on its constructing OS thread, created in a released state (not current, no GIL). Non-copyable, non-movable (PyThreadState is bound to its creating OS thread). - subinterpreter_scoped_activate(subinterpreter_thread_state &) overload: swaps the owned PyThreadState in on entry, swaps it out on exit, does not touch its lifetime. Behavior - The existing subinterpreter_scoped_activate(subinterpreter const &) overload is unchanged (still transient: New on entry, Delete on exit). All previously-working code keeps working. - With subinterpreter_thread_state, one OS thread can alternate between multiple subinterpreters and each PyThreadState is preserved across activations -- the use case that gil_scoped_release/acquire + a long-lived scoped_activate cannot solve alone (the per-thread internals.tstate slot holds only one inactive tstate). - The dtor of subinterpreter_thread_state guards against the "destroyed-while-active" contract violation: if Swap reveals the cached tstate was current, do not Swap back to a now-deleted pointer (the safe-when-active fix b-pass requested for the old release_* functions, applied at the natural location instead). Lifetime contract is enforced by ordinary C++ scope: typical placement is `thread_local`. No new release/cleanup APIs are required. Tests cover (a) tstate identity preserved across activations on a thread, (b) transient and reusing modes do not share state, (c) different OS threads get distinct PyThreadStates, and (d) the multi-subinterpreter alternation case. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(subinterpreter): address review on #6073 (same-thread checks, test scoping) Per @b-pass's review: - ~subinterpreter_thread_state(): add a PYBIND11_DETAILED_ERROR_MESSAGES- guarded check that destruction happens on the OS thread that created the PyThreadState (same PyThread_get_thread_native_id pattern as ~subinterpreter), failing with pybind11_fail otherwise. - subinterpreter_scoped_activate(subinterpreter_thread_state &): add the matching DETAILED_ERROR_MESSAGES check that activation happens on the creating OS thread, enforcing the newly documented rule. - docs: document that activating a subinterpreter_thread_state on another OS thread is illegal. - tests: keep each subinterpreter (and its subinterpreter_thread_state) in an enclosing scope so destruction order is thread-state -> subinterpreter -> unsafe_reset_internals_for_single_interpreter(). The previous top-level declarations ran the reset while the subinterpreters were still alive, which is the likely cause of the CI crashes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: fix codespell (re-used -> reused) in embedding.rst Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>