# Sorted Open Data with Shapely and SVG

Sat 18 February 2017I recently came across the Sorted Cities project by Hans Hack. He uses building footprint data extracted from OpenStreetMap to create beautiful posters, showing all of the buildings in a city sorted by their area.

As soon as I saw this, the hacker in me thought "how would I go about creating this myself?"

Shapely has a fantastic feature that converts geometries into an SVG representation, which is used to display the geometry in Jupyter. For example:

```
from shapely.geometry import Point
from shapely.ops import cascaded_union
cascaded_union([Point(0,0).buffer(2), Point(2,0).buffer(2)])
```

The method that does this, `BaseGeometry._repr_svg_`, needs a little tweaking to ensure all of the geometries are displayed at the same scale, rather than with a defined height. But the hard work is already done. We just need to sort the geometries and rely on HTML for the layout.

GeoPandas makes quick work of loading data from a shapefile and sorting by area:

```
df = GeoDataFrame.from_file(filename)
df["AREA"] = df["geometry"].area
df = df.sort_values(by="AREA", ascending=False)
```

I considered using the same OSM building dataset, but instead decided to try something novel. I work as a hydrologist and spend a lot of time using data related to rivers, lakes and reservoirs. Lakes and reservoirs seemed an obvious choice as they have such distinctive shapes. Fortunately the Environment Agency produce some open data for this: WFD Lake Waterbodies.

The result is shown below. Hovering over a lake will show the name and calculated area.

Contains public sector information licensed under the Open Government Licence v3.0.

The first attempt produced quite a large file (2.1MB) as it includes all of the detail of the original data. The following steps were taken to reduce the size of the output.

- Simplify the geometry with a threshold of 25 metres (selected after some trial and error) using
`geometry.simplify`. - Translate the geometry so that the lower left corner is at x=1, y=1. This results in smaller number (and therefore less characters) and is acceptable as the coordinates only need to be relative to each other, not those of another geometry. This is a technique borrowed from the encoding of OSM PBF data.
- Round the coordinates to the nearest 1 metre.
- Strip the redundant decimal from the output of Shapely's SVG representation.
- Style the paths using a class in CSS rather than inline.

These changes bring the final size down to a more acceptable 388KB (102KB gzipped).

The full source code is given below. It could certainly be more elegant, but I was trying for the least effort (a true hack!).

```
from geopandas import GeoDataFrame
from shapely.ops import transform
from shapely.affinity import translate
import re
def repr_svg(geometry, base_height=200, scale=None, title=None):
"""SVG representation of a Shapely geometry
adapted from Shapely's BaseGeometry._repr_svg_ method
https://github.com/Toblerity/Shapely/blob/master/shapely/geometry/base.py
"""
svg_attrs = {}
xmin, ymin, xmax, ymax = geometry.bounds
expand = 0.04
widest_part = max([xmax - xmin, ymax - ymin])
expand_amount = widest_part * expand
xmin -= expand_amount
ymin -= expand_amount
xmax += expand_amount
ymax += expand_amount
dx = xmax - xmin
dy = ymax - ymin
if scale is None:
# scale to a given height
height = base_height
width = (dx / dy) * base_height
scale = height / dy
else:
# use previously calculated scale factor
height = dy * scale
width = dx * scale
try:
scale_factor = max([dx, dy]) / max([width, height])
except ZeroDivisionError:
scale_factor = 1.0
view_box = "{xmin} {ymin} {dx} {dy}".format(xmin=xmin, ymin=ymin, dx=dx, dy=dy)
transform = "matrix(1,0,0,-1,0,{0})".format(ymax + ymin)
svg_attrs.update({
"width": width,
"height": height,
"viewBox": view_box,
"preserveAspectRatio": "xMinYMin meet",
})
g = """<g transform="{t}">{x}</g>""".format(t=transform, x=geometry.svg(scale_factor))
# title for image (shown on mouseover)
if title:
t = '<title>{t}</title>'.format(t=title)
else:
t = ''
# create svg object
svg = "<svg {a}>{t}{g}</svg>".format(a=" ".join(["{k}=\"{v}\"".format(k=k,v=v) for k,v in svg_attrs.items()]), t=t, g=g)
# change style
svg = re.sub('fill=.*? opacity="0.6"', 'class="lake" stroke-width="{}"'.format(1*scale_factor), svg)
# remove redundant decimals
svg = re.sub(' ([0-9]{1,})\.0,([0-9]{1,}).0', r' \g<1>,\g<2>', svg)
return svg, scale
def round_coord(x, y, z=None):
"""Transform function to round coordinates"""
x = round(x, 0)
y = round(y, 0)
return tuple(filter(None, [x, y, z]))
def main():
# load and sort the data
filename = "WFD - Lake Waterbodies.shp"
df = GeoDataFrame.from_file(filename)
df["AREA"] = df["geometry"].area
df = df.sort_values(by="AREA", ascending=False)
print('<div style="margin-bottom: 30px;">')
print('<style>.lake {fill: #75CFF0; stroke: #000; stroke-opacity: 0.4;}</style>')
scale = None
for r, feature in df.iterrows():
geometry = feature["geometry"]
# simplify the geometry
geometry = geometry.simplify(25)
xmin, ymin, xmax, ymax = geometry.bounds
geometry = translate(geometry, xoff=-xmin+1, yoff=-ymin+1)
geometry = transform(round_coord, geometry)
# render
title = "{} ({:.2f} km²)".format(feature["name"], feature["AREA"]/1000**2)
svg, _scale = repr_svg(geometry, scale=scale, title=title)
if scale is None:
scale = _scale
print(svg)
print("</div>")
print("<p>Contains public sector information licensed under the <a href=\"https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/\">Open Government Licence v3.0.</a></p>")
if __name__ == "__main__":
main()
```

## License

The `repr_svg` function is based on the BaseGeometry._repr_svg method is Shapely and is licensed under the simplified BSD license, as below:

Copyright (c) 2007, Sean C. Gillies All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
- Neither the name of Sean C. Gillies nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.