Pickling Cython classesSat 16 April 2016
Automatic pickle support in Cython is still a pending feature. In order to support pickling of cdef classes you must implement the pickle protocol. This is done by implementing the __getstate__ and __setstate__ methods. Although the official documentation is quite clear, it lacks a simple example and also instruction on handling objects that can't be directly pickled.
A minimal example is given below for the Person class which stores a name (string) and age (integer).
cdef class Person: cdef public str name cdef public int age def __init__(self): print('Person.__init__') def __getstate__(self): return (self.name, self.age,) def __setstate__(self, state): name, age = state self.name = name self.age = age
The __getstate__ method returns an object – in this case, a tuple – which represents the state of the instance and is pickled instead of the contents of the instance's __dict__ (which is not defined in this class). The __setstate__ method receives the state object and applies it to the instance. Note that the __init__ method of the instance is not called during unpickling.
The example above is simple because the string and integer objects in the state tuple can be serialized automatically. But what about more complex structures, such as a malloc'ed array of structs with a variable length?
The next example achieves this by serializing the array to a Python bytes object which can be pickled. This is done by casting the _data variable to char* (a free operation) and then to bytes (which invokes PyBytes_FromStringAndSize in the C generated by Cython). Deserialization is done by casting back to char* (invoking PyObject_AsString) and then memcpy to copy the data into the array. The array is exposed to Python as a list of (gears, price) tuples using a Cython property.
from cpython.mem cimport PyMem_Malloc, PyMem_Free from libc.string cimport memcpy cdef struct Bicycle: int gears double price cdef class MyClass: cdef Bicycle *_data cdef long size def __init__(self): print('MyClass.__init__') cpdef bytes get_data(self): """Serializes array to a bytes object""" if self._data == NULL: return None return <bytes>(<char *>self._data)[:sizeof(Bicycle) * self.size] cpdef void set_data(self, bytes data, long size): """Deserializes a bytes object to an array""" PyMem_Free(self._data) self.size = size self._data = <Bicycle*>PyMem_Malloc(sizeof(Bicycle) * self.size) if not self._data: raise MemoryError() memcpy(self._data, <char *>data, sizeof(Bicycle) * self.size) property data: """Python interface to array""" def __get__(self): return [(self._data[i].gears, self._data[i].price) for i in range(0, self.size)] def __set__(self, values): self.size = len(values) self._data = <Bicycle*>PyMem_Malloc(sizeof(Bicycle) * self.size) if not self._data: raise MemoryError() for i, (gears, price) in enumerate(values): self._data[i].gears = gears self._data[i].price = price def __getstate__(self): return (self.get_data(), self.size) def __setstate__(self, state): self.set_data(*state) def __dealloc__(self): PyMem_Free(self._data)
PyMem_Malloc and PyMem_Free are used instead of malloc and free as per the recommendation in the Cython documentation on memory allocation. We need to keep track of the array's size as it's not possible to retrieve it from the array later [reference].
The class can be subclassed without any difficulty. The MySubclass class below adds a new method (get_average_price) and the owner attribute which stores an instance of Person. The owner attribute needs to be added to the instance's state. This is done by concatenating it with the state tuple returned by the superclass. As the Person class already implements the pickle protocol for itself it can be added directly to the state tuple.
cdef class MySubclass(MyClass): cdef public Person owner cpdef double get_average_price(self): if not self.size: return None cdef total = 0 for i in range(0, self.size): total += self._data[i].price return total / self.size def __getstate__(self): state = super(MySubclass, self).__getstate__() state = state + (self.owner,) return state def __setstate__(self, state): self.owner = state[-1] super(MySubclass, self).__setstate__(state[:-1])
The code below demonstrates the creation, pickling and unpickling of the classes.
import pickle from example import Person, MySubclass # create a new instance of Person dave = Person(name="Dave", age=30) # pickle the person d = pickle.dumps(dave) del(dave) # unpickle the person dave = pickle.loads(d) assert(dave.name == "Dave") assert(dave.age == 30) # create a new instance of MySubclass c = MySubclass() data = [(1, 50.0), (7, 199.0), (21, 399.0),] c.data = data c.owner = dave assert(c.data == data) # pickle the instance d = pickle.dumps(c) del(c) # unpickle the instance c = pickle.loads(d) assert(type(c) is MySubclass) assert(c.data == data) assert(c.owner.name == "Dave") assert(c.get_average_price() == 216.0)
Also note that for arrays of standard C types (doubles, integers, etc.) NumPy arrays support the pickle protocol – in fact, they use the same approach of copying the data into a bytes object.