Files
pybind11/include/pybind11
ymwang78 46ebf5031b feat(subinterpreter): reusable PyThreadState via subinterpreter_thread_state (#6073)
* feat(subinterpreter): add opt-in TLS-cached thread state mode

subinterpreter_scoped_activate previously created and destroyed a fresh
PyThreadState on every activation when the calling OS thread was not
already running the target interpreter. Workloads that repeatedly
re-enter the same sub-interpreter from the same thread therefore churn
thread states and lose per-thread interpreter state between activations
(see pybind/pybind11#6040).

Add an opt-in subinterpreter_thread_state::cached policy: on first use a
PyThreadState is created and stored in OS-thread-local storage keyed by
the target interpreter; subsequent activations on that thread only swap
it in/out and never destroy it. The default stays transient, so existing
behavior is unchanged.

Since pybind11 does not control thread lifetime, cleanup is explicit:
subinterpreter::release_cached_thread_state() releases the calling
thread's cached state for one interpreter, and the static
release_all_cached_thread_states() releases all of the calling thread's
cached states as an end-of-thread hook. The TLS map's destructor only
frees its own nodes and never touches the Python C API, so an
unreleased state leaks rather than crashing at thread exit.

Includes test coverage and embedding docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* style: pre-commit fixes

* refactor(subinterpreter): replace cached enum/TLS with subinterpreter_thread_state RAII

Address review feedback on the original "cached" mode by switching to an
explicit two-RAII design suggested by @b-pass:

  "Create a class ... to RAII-manage the PyThreadState but start its
   lifetime in an already released state. You could create another
   class (or modify scoped_activate) to scoped/RAII activate the
   inactive threadstate."

Removed
  - enum subinterpreter_thread_state { transient, cached } and the
    defaulted ctor parameter on subinterpreter_scoped_activate.
  - detail::subinterpreter_thread_state_cache thread_local map.
  - subinterpreter::release_cached_thread_state() and
    subinterpreter::release_all_cached_thread_states().

This eliminates: the hidden per-thread map, the "release_all" footgun
across pybind11 modules (the cache was module-local), and the implicit
"must not be active when called" contract on the release functions.

Added
  - Public class subinterpreter_thread_state that owns one PyThreadState
    for a given subinterpreter on its constructing OS thread, created in
    a released state (not current, no GIL). Non-copyable, non-movable
    (PyThreadState is bound to its creating OS thread).
  - subinterpreter_scoped_activate(subinterpreter_thread_state &)
    overload: swaps the owned PyThreadState in on entry, swaps it out
    on exit, does not touch its lifetime.

Behavior
  - The existing subinterpreter_scoped_activate(subinterpreter const &)
    overload is unchanged (still transient: New on entry, Delete on
    exit). All previously-working code keeps working.
  - With subinterpreter_thread_state, one OS thread can alternate
    between multiple subinterpreters and each PyThreadState is preserved
    across activations -- the use case that gil_scoped_release/acquire
    + a long-lived scoped_activate cannot solve alone (the per-thread
    internals.tstate slot holds only one inactive tstate).
  - The dtor of subinterpreter_thread_state guards against the
    "destroyed-while-active" contract violation: if Swap reveals the
    cached tstate was current, do not Swap back to a now-deleted
    pointer (the safe-when-active fix b-pass requested for the old
    release_* functions, applied at the natural location instead).

Lifetime contract is enforced by ordinary C++ scope: typical placement
is `thread_local`. No new release/cleanup APIs are required.

Tests cover (a) tstate identity preserved across activations on a
thread, (b) transient and reusing modes do not share state, (c)
different OS threads get distinct PyThreadStates, and (d) the
multi-subinterpreter alternation case.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(subinterpreter): address review on #6073 (same-thread checks, test scoping)

Per @b-pass's review:

- ~subinterpreter_thread_state(): add a PYBIND11_DETAILED_ERROR_MESSAGES-
  guarded check that destruction happens on the OS thread that created the
  PyThreadState (same PyThread_get_thread_native_id pattern as ~subinterpreter),
  failing with pybind11_fail otherwise.
- subinterpreter_scoped_activate(subinterpreter_thread_state &): add the
  matching DETAILED_ERROR_MESSAGES check that activation happens on the
  creating OS thread, enforcing the newly documented rule.
- docs: document that activating a subinterpreter_thread_state on another OS
  thread is illegal.
- tests: keep each subinterpreter (and its subinterpreter_thread_state) in an
  enclosing scope so destruction order is thread-state -> subinterpreter ->
  unsafe_reset_internals_for_single_interpreter(). The previous top-level
  declarations ran the reset while the subinterpreters were still alive, which
  is the likely cause of the CI crashes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: fix codespell (re-used -> reused) in embedding.rst

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-05-25 09:31:14 -04:00
..
2025-08-30 23:07:03 -07:00
2025-05-16 21:58:43 -04:00
2025-11-13 16:29:02 -08:00