Automatic pickle support in Cython is still a pending
In order to support pickling of cdef classes you must implement the
This is done by implementing the
methods. Although the official documentation is quite clear, it lacks a
simple example and also instruction on handling objects that can’t be
A minimal example is given below for the
Person class which stores a
name (string) and age (integer).
cdef class Person: cdef public str name cdef public int age def __init__(self): print('Person.__init__') def __getstate__(self): return (self.name, self.age,) def __setstate__(self, state): name, age = state self.name = name self.age = age
__getstate__ method returns an object – in this case, a tuple –
which represents the state of the instance and is pickled instead of the
contents of the instance’s
__dict__ (which is not defined in this
__setstate__ method receives the state object and applies
it to the instance. Note that the
__init__ method of the instance is
not called during unpickling.
The example above is simple because the string and integer objects in
the state tuple can be serialized automatically. But what about more
complex structures, such as a
malloc‘ed array of structs with a
The next example achieves this by serializing the array to a Python
bytes object which can be pickled. This is done by casting the
char* (a free operation) and then to
PyBytes_FromStringAndSize in the C generated by Cython).
Deserialization is done by casting back to
PyObject_AsString) and then
memcpy to copy the data into the array.
The array is exposed to Python as a list of (gears, price) tuples using
a Cython property.
from cpython.mem cimport PyMem_Malloc, PyMem_Free from libc.string cimport memcpy cdef struct Bicycle: int gears double price cdef class MyClass: cdef Bicycle *_data cdef long size def __init__(self): print('MyClass.__init__') cpdef bytes get_data(self): """Serializes array to a bytes object""" if self._data == NULL: return None return <bytes>(<char *>self._data)[:sizeof(Bicycle) * self.size] cpdef void set_data(self, bytes data, long size): """Deserializes a bytes object to an array""" PyMem_Free(self._data) self.size = size self._data = <Bicycle*>PyMem_Malloc(sizeof(Bicycle) * self.size) if not self._data: raise MemoryError() memcpy(self._data, <char *>data, sizeof(Bicycle) * self.size) property data: """Python interface to array""" def __get__(self): return [(self._data[i].gears, self._data[i].price) for i in range(0, self.size)] def __set__(self, values): self.size = len(values) self._data = <Bicycle*>PyMem_Malloc(sizeof(Bicycle) * self.size) if not self._data: raise MemoryError() for i, (gears, price) in enumerate(values): self._data[i].gears = gears self._data[i].price = price def __getstate__(self): return (self.get_data(), self.size) def __setstate__(self, state): self.set_data(*state) def __dealloc__(self): PyMem_Free(self._data)
PyMem_Free are used instead of
as per the recommendation in the Cython documentation on memory
We need to keep track of the array’s size as it’s not possible to
retrieve it from the array later
The class can be subclassed without any difficulty. The
class below adds a new method (
get_average_price) and the
attribute which stores an instance of
needs to be added to the instance’s state. This is done by concatenating
it with the state tuple returned by the superclass. As the
class already implements the pickle protocol for itself it can be added
directly to the state tuple.
cdef class MySubclass(MyClass): cdef public Person owner cpdef double get_average_price(self): if not self.size: return None cdef total = 0 for i in range(0, self.size): total += self._data[i].price return total / self.size def __getstate__(self): state = super(MySubclass, self).__getstate__() state = state + (self.owner,) return state def __setstate__(self, state): self.owner = state[-1] super(MySubclass, self).__setstate__(state[:-1])
The code below demonstrates the creation, pickling and unpickling of the classes.
import pickle from example import Person, MySubclass # create a new instance of Person dave = Person(name="Dave", age=30) # pickle the person d = pickle.dumps(dave) del(dave) # unpickle the person dave = pickle.loads(d) assert(dave.name == "Dave") assert(dave.age == 30) # create a new instance of MySubclass c = MySubclass() data = [(1, 50.0), (7, 199.0), (21, 399.0),] c.data = data c.owner = dave assert(c.data == data) # pickle the instance d = pickle.dumps(c) del(c) # unpickle the instance c = pickle.loads(d) assert(type(c) is MySubclass) assert(c.data == data) assert(c.owner.name == "Dave") assert(c.get_average_price() == 216.0)
Also note that for arrays of standard C types (doubles, integers, etc.) NumPy arrays support the pickle protocol – in fact, they use the same approach of copying the data into a bytes object.