The nb::ndarray<..> class

nanobind can exchange n-dimensional arrays (henceforth “nd-arrays”) with popular array programming frameworks including NumPy, PyTorch, TensorFlow, JAX, and CuPy. It supports zero-copy exchange using two protocols:

nanobind knows how to talk to each framework and takes care of all the nitty-gritty details.

To use this feature, you must add the include directive

#include <nanobind/ndarray.h>

to your code. Following this, you can bind functions with nb::ndarray<...>-typed parameters and return values.

Array input arguments

A function that accepts an nb::ndarray<>-typed parameter (i.e., without template parameters) can be called with any writable array from any framework regardless of the device on which it is stored. The following example binding declaration uses this functionality to inspect the properties of an arbitrary input array:

m.def("inspect", [](const nb::ndarray<>& a) {
    printf("Array data pointer : %p\n", a.data());
    printf("Array dimension : %zu\n", a.ndim());
    for (size_t i = 0; i < a.ndim(); ++i) {
        printf("Array dimension [%zu] : %zu\n", i, a.shape(i));
        printf("Array stride    [%zu] : %zd\n", i, a.stride(i));
    }
    printf("Device ID = %u (cpu=%i, cuda=%i)\n", a.device_id(),
        int(a.device_type() == nb::device::cpu::value),
        int(a.device_type() == nb::device::cuda::value)
    );
    printf("Array dtype: int16=%i, uint32=%i, float32=%i\n",
        a.dtype() == nb::dtype<int16_t>(),
        a.dtype() == nb::dtype<uint32_t>(),
        a.dtype() == nb::dtype<float>()
    );
});

Below is an example of what this function does when called with a NumPy array:

>>> my_module.inspect(np.array([[1,2,3], [3,4,5]], dtype=np.float32))
Array data pointer : 0x1c30f60
Array dimension : 2
Array dimension [0] : 2
Array stride    [0] : 3
Array dimension [1] : 3
Array stride    [1] : 1
Device ID = 0 (cpu=1, cuda=0)
Array dtype: int16=0, uint32=0, float32=1

Array constraints

In practice, it can often be useful to constrain what kinds of arrays constitute valid inputs to a function. For example, a function expecting CPU storage would likely crash if given a pointer to GPU memory, and nanobind should therefore prevent such undefined behavior. The nb::ndarray<...> class accepts template arguments to specify such constraints. For example the binding below guarantees that the implementation can only be called with CPU-resident arrays with shape (·,·,3) containing 8-bit unsigned integers.

using RGBImage = nb::ndarray<uint8_t, nb::shape<-1, -1, 3>, nb::device::cpu>;

m.def("process", [](RGBImage data) {
    // Double brightness of the MxNx3 RGB image
    for (size_t y = 0; y < data.shape(0); ++y)
        for (size_t x = 0; x < data.shape(1); ++x)
            for (size_t ch = 0; ch < 3; ++ch)
                data(y, x, ch) = (uint8_t) std::min(255, data(y, x, ch) * 2);
});

The above example also demonstrates the use of operator(), which provides direct read/write access to the array contents assuming that they are reachable through the CPU’s virtual address space.

Overview

Overall, the following kinds of constraints are available:

  • Data type: a type annotation like float, uint8_t, etc., constrain the numerical representation of the nd-array. Complex arrays (i.e., std::complex<float> or std::complex<double>) are also supported.

  • Constant arrays: further annotating the data type with const makes it possible to call the function with constant arrays that do not permit write access. Without the annotation, calling the binding would fail with a TypeError.

    You can alternatively accept constant arrays of any type by not specifying a data type at all and instead passing the nb::ro annotation.

  • Shape: The nb::shape annotation (as in nb::shape<-1, 3>) simultaneously constrains the number of array dimensions and the size per dimension. A value of -1 leaves the size of the associated dimension unconstrained.

    nb::ndim<N> is shorter when only the dimension should be constrained. For example, nb::ndim<3> is equivalent to nb::shape<-1, -1, -1>.

  • Device tags: annotations like nb::device::cpu or nb::device::cuda constrain the source device and address space.

  • Memory order: two ordering tags nb::c_contig and nb::f_contig enforce contiguous storage in either C or Fortran style.

    In the case of matrices, C-contiguous implies row-major and F-contiguous implies column-major storage. Without this tag, arbitrary non-contiguous representations (e.g. produced by slicing operations) and other unusual layouts are permitted.

    This tag is mainly useful when your code directly accesses the array contents via nb::ndarray<...>::data(), while assuming a particular layout.

    A third order tag named nb::any_contig accepts both F- and C-contiguous arrays while rejecting non-contiguous ones.

Type signatures

nanobind displays array constraints in docstrings and error messages. For example, suppose that we now call the process() function with an invalid input. This produces the following error message:

>>> my_module.process(np.zeros(1))

TypeError: process(): incompatible function arguments. The following argument types are supported:
1. process(arg: ndarray[dtype=uint8, shape=(*, *, 3), device='cpu'], /) -> None

Invoked with types: numpy.ndarray

Note that these type annotations are intended for humans–they will not currently work with automatic type checking tools like MyPy (which at least for the time being don’t provide a portable or sufficiently flexible annotation of n-dimensional arrays).

Overload resolution

A function binding can declare multiple overloads with different nd-array constraints (e.g., a CPU and a GPU implementation), in which case nanobind will call the first matching overload. When no perfect match can be found, nanobind will try each overload once more while performing basic implicit conversions: it will convert strided arrays into C- or F-contiguous arrays (if requested) and perform type conversion. This, e.g., makes it possible to call a function expecting a float32 array with float64 data. Implicit conversions create temporary nd-arrays containing a copy of the data, which can be undesirable. To suppress them, add an nb::arg("my_array_arg").noconvert() or "my_array_arg"_a.noconvert() argument annotation.

Passing arrays within C++ code

You can think of the nb::ndarray class as a reference-counted pointer resembling std::shared_ptr<T> that can be freely moved or copied. This means that there isn’t a big difference between a function taking ndarray by value versus taking a constant reference const ndarray & (i.e., the former does not create an additional copy of the underlying data).

Copies of the nb::ndarray wrapper will point to the same underlying buffer and increase the reference count until they go out of scope. You may call freely call nb::ndarray<...> methods from multithreaded code even when the GIL is not held, for example to examine the layout of an array and access the underlying storage.

There are two exceptions to this: creating a new nd-array object from C++ (discussed later) and casting it to Python via the ndarray::cast() function both involve Python API calls that require that the GIL is held.

Returning arrays from C++ to Python

Passing an nd-array across the C++ → Python language barrier is a two-step process:

  1. Creating an nb::ndarray<...> instance, which only stores metadata, e.g.:

    • Where is the data located in memory? (pointer address and device)

    • What is its type and shape?

    • Who owns this data?

    An actual Python object is not yet constructed at this stage.

  2. Converting the nb::ndarray<...> into a Python object of the desired type (e.g. numpy.ndarray).

Normally, step 1 is your responsibility, while step 2 is taken care of by the binding layer. To understand this separation, let’s look at an example. The .view() function binding below creates a 4×4 column-major NumPy array view into a Matrix4f instance.

struct Matrix4f { float m[4][4] { }; };

using Array = nb::ndarray<float, nb::numpy, nb::shape<4, 4>, nb::f_contig>;

nb::class_<Matrix4f>(m, "Matrix4f")
    .def(nb::init<>())
    .def("view",
         [](Matrix4f &m){ return Array(data); },
         nb::rv_policy::reference_internal);

In this case:

  • step 1 is the Array(data) call in the lambda function.

  • step 2 occurs outside of the lambda function when the nd-array nb::ndarray<...> type caster constructs a NumPy array from the metadata.

Data ownership is an important aspect of this two-step process: because the NumPy array points directly into the storage of another object, nanobind must keep the Matrix4f instance alive as long as the NumPy array exists, which the reference_internal return value policy signals to nanobind. More generally, wrapping an existing memory region without copying requires that that this memory region remains valid throughout the lifetime of the created array (more on this point shortly).

Recall the discussion of the nd-array constraint template parameters. For the return path, you will generally want to add a framework template parameter to the nd-array parameters that indicates the desired Python type.

  • nb::numpy: create a numpy.ndarray.

  • nb::pytorch: create a torch.Tensor.

  • nb::tensorflow: create a tensorflow.python.framework.ops.EagerTensor.

  • nb::jax: create a jaxlib.xla_extension.DeviceArray.

  • nb::cupy: create a cupy.ndarray.

  • No framework annotation. In this case, nanobind will create a raw Python dltensor capsule representing the DLPack metadata.

This annotation also affects the auto-generated docstring of the function, which in this case becomes:

view(self) -> numpy.ndarray[float32, shape=(4, 4), order='F']

Note that the framework annotation only plays a role when passing arrays from C++ to Python. It does not constrain the reverse direction (for example, a PyTorch array would still be accepted by a function taking the Array alias defined above as input. For this reason, you may want to add a nb::device::cpu device annotation).

Dynamic array configurations

The previous example was rather simple because all the array configuration was fully known at compile time and specified via the nb::ndarray<...> template parameters. In general, there are often dynamic aspects of the configuration that must be explicitly passed to the constructor. Its signature (with some simplifications) is given below. See the ndarray::ndarray() documentation for a more detailed specification and another variant of the constructor.

ndarray(void *data,
        std::initializer_list<size_t> shape = { },
        handle owner = { },
        std::initializer_list<int64_t> strides = { },
        dlpack::dtype dtype = ...,
        int device_type = ...,
        int device_id = 0,
        char order = ...) { .. }

The parameters have the following role:

  • data: CPU/GPU/.. memory address of the data.

  • shape: number of dimensions and size along each axis.

  • owner: a Python object owning the storage, which must be kept alive while the array object exists.

  • strides: specifies the data layout in memory. You only need to specify this parameter if it has a non-standard order (e.g., if it is non-contiguous). Note that the strides count elements, not bytes.

  • dtype data type (floating point, signed/unsigned integer), bit depth.

  • device_type and device_id: device type and number, e.g., for multi-GPU setups.

  • order: coefficient memory order. Default: 'C' (C-style) ordering, specify 'F' for Fortran-style ordering.

The parameters generally have inferred defaults based on the array’s compile-time template parameters. Passing them explicitly overrides these defaults with information available at runtime.

Data ownership

Let’s look at a fancier example that uses the constructor arguments explained above to return a dynamically sized 2D array. This example also shows another mechanism to express data ownership:

m.def("create_2d",
      [](size_t rows, size_t cols) {
          // Allocate a memory region an initialize it
          float *data = new float[rows * cols];
          for (size_t i = 0; i < rows * cols; ++i)
              data[i] = (float) i;

          // Delete 'data' when the 'owner' capsule expires
          nb::capsule owner(data, [](void *p) noexcept {
             delete[] (float *) p;
          });

          return nb::ndarray<nb::numpy, float, nb::ndim<2>>(
              /* data = */ data,
              /* shape = */ { rows, cols },
              /* owner = */ owner
          );
});

The owner parameter should specify a Python object, whose continued existence keeps the underlying memory region alive. Nanobind will temporarily increase the owner reference count in the ndarray::ndarray() constructor and then decrease it again when the created NumPy array expires.

The above example binding returns a new memory region that should be deleted when it is no longer in use. This is done by creating a nb::capsule, an opaque pointer with a destructor callback that runs at this point and takes care of cleaning things up.

If there is already an existing Python object, whose existence guarantees that it is safe to access the provided storage region, then you may alternatively pass this object as the owner—nanobind will make sure that this object isn’t deleted as long as the created array exists. If the owner is a C++ object with an associated Python instance, you may use nb::find() to look up the associated Python object. When binding methods, you can use the reference_internal return value policy to specify the implicit self argument as the owner upon return, which was done in the earlier Matrix4f example.

Warning

If you do not specify an owner and use a return value policy like rv_policy::reference (see also the the section on nd-array return value policies), nanobind will assume that the array storage remains valid forever.

This is one of the most frequent issues reported on the nanobind GitHub repository: users forget to think about data ownership and run into data corruption.

If there isn’t anything keeping the array storage alive, it will likely be released and reused at some point, while stale arrays still point to the associated memory region (i.e., a classic “use-after-free” bug).

In more advanced situations, it may be helpful to have a capsule that manages the lifetime of data structures containing multiple storage regions. The same capsule can be referenced from different nd-arrays and will call the deleter when all of them have expired:

m.def("return_multiple", []() {
    struct Temp {
        std::vector<float> vec_1;
        std::vector<float> vec_2;
    };

    Temp *temp = new Temp();
    temp->vec_1 = std::move(...);
    temp->vec_2 = std::move(...);

    nb::capsule deleter(temp, [](void *p) noexcept {
        delete (Temp *) p;
    });

    size_t size_1 = temp->vec_1.size();
    size_t size_2 = temp->vec_2.size();

    return std::make_pair(
        nb::ndarray<nb::pytorch, float>(temp->vec_1.data(), { size_1 }, deleter),
        nb::ndarray<nb::pytorch, float>(temp->vec_2.data(), { size_2 }, deleter)
    );
});

Return value policies

Function bindings that return nd-arrays can specify return value policy annotations to determine whether or not a copy should be made. They are interpreted as follows:

Returning temporaries

Returning nd-arrays from temporaries (e.g. stack-allocated memory) requires extra precautions.

using Vector3f = nb::ndarray<float, nb::numpy, nb::shape<3>>;
m.def("return_vec3", []{
    float data[] { 1, 2, 3 };
    // !!! BAD don't do this !!!
    return Vector3f(data);
});

Recall the discussion at the beginning of this subsection. The nb::ndarray<...> constructor only creates metadata describing this array, with the actual array creation happening after of the function call. That isn’t safe in this case because data is a temporary on the stack that is no longer valid once the function has returned. To fix this, we could use the nb::cast() method to force the array creation in the body of the function:

using Vector3f = nb::ndarray<float, nb::numpy, nb::shape<3>>;
m.def("return_vec3", []{
    float data[] { 1, 2, 3 };
    // OK.
    return nb::cast(Vector3f(data));
});

While safe, one unfortunate aspect of this change is that the function now has a rather non-informative docstring return_vec3() -> object, which is a consequence of nb::cast() returning a generic nb::object.

To fix this, you can use the nd-array .cast() method, which is like nb::cast() except that it preserves the type signature:

using Vector3f = nb::ndarray<float, nb::numpy, nb::shape<3>>;
m.def("return_vec3", []{
    float data[] { 1, 2, 3 };
    // Perfect.
    return Vector3f(data).cast();
});

Nonstandard arithmetic types

Low or extended-precision arithmetic types (e.g., int128, float16, bfloat16) are sometimes used but don’t have standardized C++ equivalents. If you wish to exchange arrays based on such types, you must register a partial overload of nanobind::detail::dtype_traits to inform nanobind about it.

You are expressively allowed to create partial overloads of this class despite it being in the nanobind::detail namespace.

For example, the following snippet makes __fp16 (half-precision type on aarch64) available by providing

  1. value, a DLPack nanobind::dlpack::dtype type descriptor, and

  2. name, a type name for use in docstrings and error messages.

namespace nanobind::detail {
    template <> struct dtype_traits<__fp16> {
        static constexpr dlpack::dtype value {
            (uint8_t) dlpack::dtype_code::Float, // type code
            16, // size in bits
            1   // lanes (simd), usually set to 1
        };
        static constexpr auto name = const_name("float16");
    };
}

Fast array views

The following advice applies to performance-sensitive CPU code that reads and writes arrays using loops that invoke nb::ndarray<...>::operator(). It does not apply to GPU arrays because they are usually not accessed in this way.

Consider the following snippet, which fills a 2D array with data:

void fill(nb::ndarray<float, nb::ndim<2>, nb::c_contig, nb::device::cpu> arg) {
    for (size_t i = 0; i < arg.shape(0); ++i)
        for (size_t j = 0; j < arg.shape(1); ++j)
            arg(i, j) = /* ... */;
}

While functional, this code is not perfect. The problem is that to compute the address of an entry, operator() accesses the DLPack array descriptor. This indirection can break certain compiler optimizations.

nanobind provides the method ndarray<...>::view() to fix this. It creates a tiny data structure that provides all information needed to access the array contents, and which can be held within CPU registers. All relevant compile-time information (nb::ndim, nb::shape, nb::c_contig, nb::f_contig) is materialized in this view, which enables constant propagation, auto-vectorization, and loop unrolling.

An improved version of the example using such a view is shown below:

void fill(nb::ndarray<float, nb::ndim<2>, nb::c_contig, nb::device::cpu> arg) {
    auto v = arg.view(); // <-- new!

    for (size_t i = 0; i < v.shape(0); ++i) // Important; use 'v' instead of 'arg' everywhere in loop
        for (size_t j = 0; j < v.shape(1); ++j)
            v(i, j) = /* ... */;
}

Note that the view performs no reference counting. You may not store it in a way that exceeds the lifetime of the original array.

When using OpenMP to parallelize expensive array operations, pass the firstprivate(view_1, view_2, ...) so that each worker thread can copy the view into its register file.

auto v = arg.view();
#pragma omp parallel for schedule(static) firstprivate(v)
for (...) { /* parallel loop */ }

Specializing views at runtime

As mentioned earlier, element access via operator() only works when both the array’s scalar type and its dimension are specified within the type (i.e., when they are known at compile time); the same is also true for array views. However, sometimes, it is useful that a function can be called with different array types.

You may use the ndarray<...>::view() method to create specialized views if a run-time check determines that it is safe to do so. For example, the function below accepts contiguous CPU arrays and performs a loop over a specialized 2D float view when the array is of this type.

void fill(nb::ndarray<nb::c_contig, nb::device::cpu> arg) {
    if (arg.dtype() == nb::dtype<float>() && arg.ndim() == 2) {
        auto v = arg.view<float, nb::ndim<2>>(); // <-- new!

        for (size_t i = 0; i < v.shape(0); ++i) {
            for (size_t j = 0; j < v.shape(1); ++j) {
                v(i, j) = /* ... */;
            }
        }
     } else { /* ... */ }
}

Array libraries

The Python array API standard defines a common interface and interchange protocol for nd-array libraries. In particular, to support inter-framework data exchange, custom array types should implement the

methods. This is easy thanks to the nd-array integration in nanobind. An example is shown below:

nb::class_<MyArray>(m, "MyArray")
   // ...
   .def("__dlpack__", [](nb::kwargs kwargs) {
       return nb::ndarray<>( /* ... */);
   })
   .def("__dlpack_device__", []() {
       return std::make_pair(nb::device::cpu::value, 0);
   });

Returning a raw nb::ndarray without framework annotation will produce a DLPack capsule, which is what the interface expects.

The kwargs argument can be used to provide additional parameters (for example to request a copy), please see the DLPack documentation for details. Note that nanobind does not yet implement the versioned DLPack protocol. The version number should be ignored for now.

Frequently asked questions

Why does my returned nd-array contain corrupt data?

If your nd-array bindings lead to undefined behavior (data corruption or crashes), then this is usually an ownership issue. Please review the section on data ownership for details.

Why does nanobind not accept my NumPy array?

When binding a function that takes an nb::ndarray<T, ...> as input, nanobind will by default require that array to be writable. This means that the function cannot be called using NumPy arrays that are marked as constant.

If you wish your function to be callable with constant input, either change the parameter to nb::ndarray<const T, ...> (if the array is parameterized by type), or write nb::ndarray<nb::ro> to accept a read-only array of any type.