Free-threaded Python¶
Free-threading is an experimental new Python feature that replaces the Global Interpreter Lock (GIL) with a fine-grained locking scheme to better leverage multi-core parallelism. The resulting benefits do not come for free: extensions must explicitly opt-in and generally require careful modifications to ensure correctness.
Nanobind can target free-threaded Python since version 2.2.0. This page explains how to do so and discusses a few caveats. Besides this page, make sure to review py-free-threading.github.io for a more comprehensive discussion of free-threaded Python. PEP 703 explains the nitty gritty details.
Opting in¶
To opt into free-threaded Python, pass the FREE_THREADED
parameter to the
nanobind_add_module()
CMake target command. For other build
systems, refer to their respective documentation pages.
nanobind_add_module(
my_ext # Target name
FREE_THREADED # Opt into free-threading
my_ext.h # Source code files below
my_ext.cpp)
nanobind ignores the FREE_THREADED
parameter when the registered Python
version does not support free-threading.
Note
Stable ABI: Note that there currently is no stable ABI for free-threaded
Python, hence the STABLE_ABI
parameter will be ignored in free-threaded
extensions builds. It is valid to combine the STABLE_ABI
and
FREE_THREADED
arguments: the build system will choose between the two
depending on the detected Python version.
Warning
Loading an Python extension that does not support free-threading disables free-threading globally. In larger binding projects with multiple extensions, all of them must be adapted.
If free-threading was requested and is available, the build system will set the
NB_FREE_THREADED
preprocessor flag. This can be helpful to specialize
binding code with #ifdef
blocks, e.g.:
#if !defined(NB_FREE_THREADED)
... // simple GIL-protected code
#else
... // more complex thread-aware code
#endif
Caveats¶
Free-threading can violate implicit assumptions made by extension developers when previously serial operations suddenly run concurrently, producing undefined behavior (race conditions, crashes, etc.).
Let’s consider a concrete example: the binding code below defines a Counter
class with an increment operation.
struct Counter {
int value = 0;
void inc() { value++; }
};
nb::class_<Counter>(m, "Counter")
.def("inc", &Counter::inc)
.def_ro("value", &Counter::value);
If multiple threads call the inc()
method of a single Counter
, the
final count will generally be incorrect, as the increment operation value++
does not execute atomically.
To fix this, we could modify the C++ type so that it protects its value
member from concurrent modification, for example using an atomic number type
(e.g., std::atomic<int>
) or a critical section (e.g., based on
std::mutex
).
The race condition in the above example is relatively benign. However, in more complex projects, combinations of concurrency and unsafe memory accesses could introduce non-deterministic data corruption and crashes.
Another common source of problems are global variables undergoing concurrent modification when no longer protected by the GIL. They will likewise require supplemental locking. The next section explains a Python-specific locking primitive that can be used in binding code besides the solutions mentioned above.
Python locks¶
Nanobind provides convenience functionality encapsulating the mutex
implementation that is part of Python (”PyMutex
”). It is slightly more
efficient than OS/language-provided synchronization primitives and generally
preferable within Python extensions.
The class ft_mutex
is analogous to std::mutex
, and
ft_lock_guard
is analogous to std::lock_guard
. Note that they
only exist to add supplemental critical sections needed in free-threaded
Python, while becoming inactive (no-ops) when targeting regular GIL-protected
Python.
With these abstractions, the previous Counter
implementation could be
rewritten as:
struct Counter {
int value = 0;
nb::ft_mutex mutex;
void inc() {
nb::ft_lock_guard guard(mutex);
value++;
}
};
These locks are very compact (sizeof(nb::ft_mutex) == 1
), though this is a
Python implementation detail that could change in the future.
Argument locking¶
Modifying class and function definitions as shown above may not always be possible. As an alternative, nanobind also provides a way to retrofit supplemental locking onto existing code. The idea is to lock individual arguments of a function before being allowed to invoke it. A built-in mutex present in every Python object enables this.
To do so, call the .lock()
member of
nb::arg()
annotations to indicate that an
argument must be locked, e.g.:
"my_parameter"_a.lock()
(short-hand form)
In methods bindings, pass nb::lock_self()
to lock
the implicit self
argument. Note that at most 2 arguments can be
locked per function, which is a limitation of the Python locking API.
The example below shows how this functionality can be used to protect inc()
and a new merge()
function that acquires two simultaneous locks.
struct Counter {
int value = 0;
void inc() { value++; }
void merge(Counter &other) {
value += other.value;
other.value = 0;
}
};
nb::class_<Counter>(m, "Counter")
.def("inc", &Counter::inc, nb::lock_self())
.def("merge", &Counter::merge, nb::lock_self(), "other"_a.lock())
.def_ro("value", &Counter::value);
The above solution has an obvious drawback: it only protects bindings (i.e.,
transitions from Python to C++). For example, if some other part of a C++
codebase calls merge()
directly, the binding layer won’t be involved, and
no locking takes place. If such behavior can introduce race conditions, a
larger-scale redesign of your project may be in order.
Note
Adding locking annotations indiscriminately is inadvisable because locked
calls are more costly than unlocked ones. The .lock()
and nb::lock_self()
annotations are
ignored in GIL-protected builds, hence this added cost only applies to
free-threaded extensions.
Furthermore, when adding locking annotations to a function, consider keeping
the arguments unnamed (i.e., nb::arg().lock()
instead of nb::arg("name").lock()
) if the function
will never be called with keyword arguments. Processing named arguments
causes small binding overheads that may be
undesirable if a function that does very little is called at a very high
rate.
Note
Python API and locking: When the lock-protected function performs Python
API calls (e.g., using wrappers like nb::dict
), Python may temporarily release locks to avoid deadlocks. Here,
even basic reference counting such as a nb::object
variable expiring at the end of a scope counts as an API call.
These locks will be reacquired following the Python API call. This behavior resembles ordinary (GIL-protected) Python code, where operations like Py_DECREF() can cause cause arbitrary Python code to execute. The semantics of this kind of relaxed critical section are described in the Python documentation.
Miscellaneous notes¶
API¶
The following API specific to free-threading has been added:
API stability¶
The interface explained in this is excluded from the project’s semantic versioning policy. Free-threading is still experimental, and API breaks may be necessary based on future experience and changes in Python itself.
Wrappers¶
Wrapper types like nb::list
may be used
in multi-threaded code. Operations like nb::list::append()
internally acquire locks and behave just like their ordinary
Python counterparts. This means that race conditions can still occur without
larger-scale synchronization, but such races won’t jeopardize the memory safety
of the program.
GIL scope guards¶
Prior to free-threaded Python, the nanobind scope guards
gil_scoped_acquire
and gil_scoped_release
would
normally be used to acquire/release the GIL and enable parallel regions.
These remain useful and should not be removed from existing code: while no longer blocking operations, they set and unset the current Python thread context and inform the garbage collector.
The gil_scoped_release
RAII scope guard class plays a special
role in free-threaded builds, since it releases all argument locks held by the current thread.
Immortalization¶
Python relies on a technique called reference counting to determine when an object is no longer needed. This approach can become a bottleneck in multi-threaded programs, since increasing and decreasing reference counts requires coordination among multiple processor cores. Python type and function objects are especially sensitive, since their reference counts change at a very high rate.
Similar to free-threaded Python itself, nanobind avoids this bottleneck by
immortalizing functions (nanobind.nb_func
, nanobind.nb_method
) and
type bindings. Immortal objects don’t require reference counting and therefore
cannot cause the bottleneck mentioned above. The main downside of this approach
is that these objects leak when the interpreter shuts down. Free-threaded
nanobind extensions disable the internal leak checker,
since it would produce many warning messages caused by immortal objects.
Internal data structures¶
Nanobind maintains various internal data structures that store information about instances and function/type bindings. These data structures also play an important role to exchange type/instance data in larger projects that are split across several independent extension modules.
The layout of these data structures differs between ordinary and free-threaded extensions, therefore nanobind isolates them from each other by assigning a different ABI version tag. This means that multi-module projects will need to consistently compile either free-threaded or non-free-threaded modules.
Free-threaded nanobind uses thread-local and sharded data structures to avoid lock and atomic contention on the internal data structures, which would otherwise become a bottleneck in multi-threaded Python programs.
Thread sanitizers¶
The thread sanitizer (TSAN) offers an effective way of tracking down undefined behavior in multithreaded application.
To use TSAN with nanonbind extensions, you must also create a custom Python build that has TSAN enabled. This is because nanobind internally builds on Python locks. If the implementation of the locks is not instrumented by TSAN, the tool will detect a large volume of false positives.
To make a TSAN-instrumented Python build, download a Python source release and
to pass the following options to its configure
script:
$ ./configure --disable-gil --with-thread-sanitizer <.. other options ..>