Automatic pickle support in Cython is still a pending
feature.
In order to support pickling of cdef classes you must implement the
pickle
protocol.
This is done by implementing the __getstate__
and __setstate__
methods. Although the official documentation is quite clear, it lacks a
simple example and also instruction on handling objects that can’t be
directly
pickled.
A minimal example is given below for the Person
class which stores a
name (string) and age (integer).
cdef class Person:
cdef public str name
cdef public int age
def __init__(self):
print('Person.__init__')
def __getstate__(self):
return (self.name, self.age,)
def __setstate__(self, state):
name, age = state
self.name = name
self.age = age
The __getstate__
method returns an object – in this case, a tuple –
which represents the state of the instance and is pickled instead of the
contents of the instance’s __dict__
(which is not defined in this
class). The __setstate__
method receives the state object and applies
it to the instance. Note that the __init__
method of the instance is
not called during unpickling.
The example above is simple because the string and integer objects in
the state tuple can be serialized automatically. But what about more
complex structures, such as a malloc
‘ed array of structs with a
variable length?
The next example achieves this by serializing the array to a Python
bytes
object which can be pickled. This is done by casting the _data
variable to char*
(a free operation) and then to bytes
(which
invokes PyBytes_FromStringAndSize
in the C generated by Cython).
Deserialization is done by casting back to char*
(invoking
PyObject_AsString
) and then memcpy
to copy the data into the array.
The array is exposed to Python as a list of (gears, price) tuples using
a Cython property.
from cpython.mem cimport PyMem_Malloc, PyMem_Free
from libc.string cimport memcpy
cdef struct Bicycle:
int gears
double price
cdef class MyClass:
cdef Bicycle *_data
cdef long size
def __init__(self):
print('MyClass.__init__')
cpdef bytes get_data(self):
"""Serializes array to a bytes object"""
if self._data == NULL:
return None
return <bytes>(<char *>self._data)[:sizeof(Bicycle) * self.size]
cpdef void set_data(self, bytes data, long size):
"""Deserializes a bytes object to an array"""
PyMem_Free(self._data)
self.size = size
self._data = <Bicycle*>PyMem_Malloc(sizeof(Bicycle) * self.size)
if not self._data:
raise MemoryError()
memcpy(self._data, <char *>data, sizeof(Bicycle) * self.size)
property data:
"""Python interface to array"""
def __get__(self):
return [(self._data[i].gears, self._data[i].price)
for i in range(0, self.size)]
def __set__(self, values):
self.size = len(values)
self._data = <Bicycle*>PyMem_Malloc(sizeof(Bicycle) * self.size)
if not self._data:
raise MemoryError()
for i, (gears, price) in enumerate(values):
self._data[i].gears = gears
self._data[i].price = price
def __getstate__(self):
return (self.get_data(), self.size)
def __setstate__(self, state):
self.set_data(*state)
def __dealloc__(self):
PyMem_Free(self._data)
PyMem_Malloc
and PyMem_Free
are used instead of malloc
and free
as per the recommendation in the Cython documentation on memory
allocation.
We need to keep track of the array’s size as it’s not possible to
retrieve it from the array later
[reference].
The class can be subclassed without any difficulty. The MySubclass
class below adds a new method (get_average_price
) and the owner
attribute which stores an instance of Person
. The owner
attribute
needs to be added to the instance’s state. This is done by concatenating
it with the state tuple returned by the superclass. As the Person
class already implements the pickle protocol for itself it can be added
directly to the state tuple.
cdef class MySubclass(MyClass):
cdef public Person owner
cpdef double get_average_price(self):
if not self.size:
return None
cdef total = 0
for i in range(0, self.size):
total += self._data[i].price
return total / self.size
def __getstate__(self):
state = super(MySubclass, self).__getstate__()
state = state + (self.owner,)
return state
def __setstate__(self, state):
self.owner = state[-1]
super(MySubclass, self).__setstate__(state[:-1])
The code below demonstrates the creation, pickling and unpickling of the classes.
import pickle
from example import Person, MySubclass
# create a new instance of Person
dave = Person(name="Dave", age=30)
# pickle the person
d = pickle.dumps(dave)
del(dave)
# unpickle the person
dave = pickle.loads(d)
assert(dave.name == "Dave")
assert(dave.age == 30)
# create a new instance of MySubclass
c = MySubclass()
data = [(1, 50.0), (7, 199.0), (21, 399.0),]
c.data = data
c.owner = dave
assert(c.data == data)
# pickle the instance
d = pickle.dumps(c)
del(c)
# unpickle the instance
c = pickle.loads(d)
assert(type(c) is MySubclass)
assert(c.data == data)
assert(c.owner.name == "Dave")
assert(c.get_average_price() == 216.0)
Also note that for arrays of standard C types (doubles, integers, etc.) NumPy arrays support the pickle protocol – in fact, they use the same approach of copying the data into a bytes object.