Why another binding library?

I started the pybind11 project back in 2015 to generate better C++/Python bindings for a project I had been working on. Thanks to many amazing contributions by others, pybind11 has since become a core dependency of software used across the world including flagship projects like PyTorch and Tensorflow. Every day, it is downloaded over 400’000 times. Hundreds of contributed extensions and generalizations address use cases of this diverse audience. However, all of this success also came with costs: the complexity of the library grew tremendously, which had a negative impact on efficiency.

Curiously, the situation now is reminiscent of 2015: binding generation with existing tools (Boost.Python, pybind11) is slow and produces enormous binaries with overheads on runtime performance. At the same time, key improvements in C++17 and Python 3.8 provide opportunities for drastic simplifications. Therefore, I am starting another binding project. This time, the scope is intentionally limited so that this doesn’t turn into an endless cycle.

So what is different?

nanobind is highly related to pybind11 and inherits most of its conventions and syntax. The main difference is a change in philosophy: pybind11 must deal with all of C++ to bind legacy codebases, while nanobind targets a smaller C++ subset. The codebase has to adapt to the binding tool and not the other way around, which allows nanobind to be simpler and faster. Pull requests with extensions and generalizations to handle subtle fringe cases were welcomed in pybind11, but they will likely be rejected in this project.

An overview of removed features is provided in a separate section. Besides feature removal, the rewrite was also an opportunity to address long-standing performance issues and add a number of major quality-of-life improvements and smaller features.

Performance improvements

The benchmark section evaluates the impact of the following performance improvements:

  • Compact objects: C++ objects are now co-located with the Python object whenever possible (less pointer chasing compared to pybind11). The per-instance overhead for wrapping a C++ type into a Python object shrinks by a factor of 2.3x. (pybind11: 56 bytes, nanobind: 24 bytes.)

  • Compact functions: C++ function binding information is now co-located with the Python function object (less pointer chasing).

  • Compact types: C++ type binding information is now co-located with the Python type object (less pointer chasing, fewer hashtable lookups).

  • Fast hash table: nanobind upgrades several important internal associative data structures that previously used std::unordered_map to a more efficient alternative (tsl::robin_map, which is included as a git submodule).

  • Vector calls: function calls from/to Python are realized using PEP 590 vector calls, which gives a nice speed boost. The main function dispatch loop no longer allocates heap memory.

  • Library component: pybind11 was designed as a header-only library, which is generally a good thing because it simplifies the compilation workflow. However, one major downside of this is that a large amount of redundant code has to be compiled in each binding file (e.g., the function dispatch loop and all of the related internal data structures). nanobind compiles a separate shared or static support library (”libnanobind”) and links it against the binding code to avoid redundant compilation. The CMake interface nanobind_add_module() fully automates these extra steps.

  • Smaller headers: #include <pybind11/pybind11.h> pulls in a large portion of the STL (about 2.1 MiB of headers with Clang and libc++). nanobind minimizes STL usage to avoid this problem. Type casters even for for basic types like std::string require an explicit opt-in by including an extra header file (e.g. #include <nanobind/stl/string.h>).

  • Simpler compilation: pybind11 was dependent on link time optimization (LTO) to produce reasonably-sized bindings, which makes linking a build time bottleneck. With nanobind’s split into a precompiled library and minimal metatemplating, LTO is no longer crucial and can be skipped.

  • Free-threading: Python 3.13+ supports a free-threaded mode that removes the Global Interpreter Lock (GIL). Both pybind11 and nanobind support free-threading as of recently. When comparing the two, nanobind provides better multi-core scaling using a localized locking scheme. In pybind11, lock contention on a central internals data structure used in every binding operation becomes a bottleneck in practice.

  • Lifetime management: nanobind maintains efficient internal data structures for lifetime management (needed for nb::keep_alive, nb::rv_policy::reference_internal, the std::shared_ptr interface, etc.). With these changes, bound types no longer need to be weak-referenceable, which saves a pointer per instance.

Major additions

nanobind includes a number of quality-of-life improvements for developers:

  • N-dimensional arrays: nanobind can exchange data with modern array programming frameworks. It uses either DLPack or the buffer protocol to achieve zero-copy CPU/GPU array exchange with frameworks like NumPy, PyTorch, TensorFlow, JAX, etc. See the section on n-dimensional arrays for details.

  • Stable ABI: nanobind can target Python’s stable ABI interface starting with Python 3.12. This means that extension modules will be compatible with future version of Python without having to compile separate binaries per interpreter. That vision is still relatively far out, however: it will require Python 3.12+ to be widely deployed.

  • Stub generation: nanobind ships with a custom stub generator and CMake integration to automatically create high quality stubs as part of the build process. Stubs make compiled extension code compatible with visual autocomplete in editors like Visual Studio Code and static type checkers like MyPy, PyRight and PyType.

  • Smart pointers, ownership, etc.: corner cases in pybind11 related to smart/unique pointers and callbacks could lead to undefined behavior. A later pybind11 redesign (smart_holder) was able to address these problems, but this came at the cost of further increased runtime overheads. The object ownership model of nanobind avoids this undefined behavior without penalizing runtime performance.

  • Leak warnings: When the Python interpreter shuts down, nanobind reports instance, type, and function leaks related to bindings, which is useful for tracking down reference counting issues. If these warnings are undesired, call nb::set_leak_warnings(false). nanobind also fully deletes its internal data structures when the Python interpreter terminates, which avoids memory leak reports in tools like valgrind.

  • Better docstrings: pybind11 pre-renders docstrings while the binding code runs. In other words, every call to .def(...) to bind a function immediately creates the underlying docstring. When a function takes a C++ type as parameter that is not yet registered in pybind11, the docstring will include a C++ type name (e.g. std::vector<int, std::allocator<int>>), which can look rather ugly. pybind11 binding declarations must be carefully arranged to work around this issue.

    nanobind avoids the issue altogether by not pre-rendering docstrings: they are created on the fly when queried. nanobind also has improved out-of-the-box compatibility with documentation generation tools like Sphinx.

  • Low-level API: nanobind exposes an optional low-level API to provide fine-grained control over diverse aspects including instance creation, type creation, and it can store supplemental data in types. The low-level API provides a useful escape hatch to pursue advanced projects that were not foreseen in the design of this library.

Minor additions

The following lists minor-but-useful additions relative to pybind11.

  • Finding Python objects associated with a C++ instance: In addition to all of the return value policies supported by pybind11, nanobind provides one additional policy named nb::rv_policy::none that only succeeds when the return value is already a known/registered Python object. In other words, this policy will never attempt to move, copy, or reference a C++ instance by constructing a new Python object.

    The new nb::find() function encapsulates this behavior. It resembles nb::cast() in the sense that it returns the Python object associated with a C++ instance. But while nb::cast() will create that Python object if it doesn’t yet exist, nb::find() will return a nullptr object. This function is useful to interface with Python’s cyclic garbage collector.

  • Parameterized wrappers: The nb::handle_t<T> type behaves just like the nb::handle class and wraps a PyObject * pointer. However, when binding a function that takes such an argument, nanobind will only call the associated function overload when the underlying Python object wraps a C++ instance of type T.

    Similarly, the nb::type_object_t<T> type behaves just like the nb::type_object class and wraps a PyTypeObject * pointer. However, when binding a function that takes such an argument, nanobind will only call the associated function overload when the underlying Python type object is a subtype of the C++ type T.

    Finally, the nb::typed<T, Ts...> annotation can parameterize any other type. The feature exists to improve the expressiveness of type signatures (e.g., to turn list into list[int]). Note, however, that nanobind does not perform additional runtime checks in this case. Please see the section on parameterizing generics for further details.

  • Signature overrides: it may sometimes be necessary to tweak the type signature of a class or function to provide richer type information to static type checkers like MyPy or PyRight. In such cases, specify the nb::sig attribute to override the default nanobind-provided signature.

    For example, the following function signature annotation creates an overload that should only be called with an 1-valued integer literal. While the function also includes a runtime check, a static type checker can now ensure that this error condition cannot possibly be triggered by a given piece of code.

    m.def("f",
          [](int arg) {
              if (arg != 1)
                 nb::raise("invalid input");
              return arg;
          },
          nb::sig("def f(arg: typing.Literal[1], /) -> int"));
    

    Please see the section on customizing function signatures and class signatures for further details.

TLDR

My recommendation is that current pybind11 users look into migrating to nanobind. Fixing all the long-standing issues in pybind11 (see above list) would require a substantial redesign and years of careful work by a team of C++ metaprogramming experts. At the same time, changing anything in pybind11 is extremely hard because of the large number of downstream users and their requirements on API/ABI stability. I personally don’t have the time and energy to fix pybind11 and have moved my focus to this project.