Masking Rasterio Layers with Vector Features

Posted on Nov 9, 2014

This post shows how to extract data from a raster only where it intersects a vector feature using Rasterio. Fiona is used to read the vector data, and Shapely is used only as a convenience to calculate the feature’s bounding box.

GDAL provides a function to “burn” vector shapes into rasters (i.e., to rasterize the geometries). This functionality is accessed from Rasterio using the rasterize function. This tool can be used to create a raster mask for another raster layer.

The example below uses two layers: national_parks.shp, a vector layer containing polygon features representing national park boundaries; and elevation.tif, a raster layer representing the elevation above sea level in metres. For each feature in the vector layer, the raster data is extracted using a window, then vector geometry is rasterized and used to mask the raster data. The result is a numpy masked array. SciPy provides the scipy.stats.mstats module which contains various statistical functions that work with masked arrays, but for this example only the min and max methods of the array itself have been demonstrated.

Contains Ordnance Survey data © Crown copyright and database right 2014

from __future__ import print_function

import numpy
import numpy.stats.mstats
import fiona
import rasterio
import rasterio.features
from affine import Affine
from shapely.geometry import shape

with fiona.open('national_parks.shp', 'r') as vector, \
        rasterio.open('elevation.tif', 'r') as raster:
        
    for feature in vector:
        # create a shapely geometry
        # this is done for the convenience for the .bounds property only
        geometry = shape(feature['geometry'])
        
        # get pixel coordinates of the geometry's bounding box
        ul = raster.index(*geometry.bounds[0:2])
        lr = raster.index(*geometry.bounds[2:4])
        
        # read the subset of the data into a numpy array
        window = ((lr[0], ul[0]+1), (ul[1], lr[1]+1))
        data = raster.read_band(1, window=window)
        
        # create an affine transform for the subset data
        t = raster.affine
        shifted_affine = Affine(t.a, t.b, t.c+ul[1]*t.a, t.d, t.e, t.f+lr[0]*t.e)
        
        # rasterize the geometry
        mask = rasterio.features.rasterize(
            [(geometry, 0)],
            out_shape=data.shape,
            transform=shifted_affine,
            fill=1,
            all_touched=True,
            dtype=numpy.uint8)
        
        # create a masked numpy array
        masked_data = numpy.ma.array(data=data, mask=mask.astype(bool))
        
        # display some statistics
        name = feature['properties']['NAME'].title()
        print('{}\nRange: {:.1f} to {:.1f}'.format(
            name,
            masked_data.min(),
            masked_data.max()))

The result looks like this:

New Forest
Range: -1.4 to 131.8
North York Moors
Range: -1.6 to 453.2
...
The Broads
Range: -3.3 to 38.2

It is assumed that the layers are in the same projection. Otherwise the vector geometries would need to be reprojected (see reprojecting Shapely geometries) before rasterization. It’s also assumed the geospatial transform used by the raster has no rotation (i.e. the b and d parameters of the affine transform are 0), which is true for most all geographic raster data.