Accessing CEH Gridded Estimates of Areal Rainfall (GEAR) with Python

Posted on

The Centre for Ecology & Hydrology (CEH) produce a dataset which provides an estimate of daily rainfall on a 1km grid for Great Britain and Northern Ireland from all the way back to 1890 onward to 2014.

The data is hosted in NetCDF format hosted on an OPeNDAP server. At the time of writing only data to 2012 is available via OPenDAP.

The script below demonstrates how to access this dataset from Python using the netCDF4 module.

"""
Using the netCDF4 Python module to access daily areal rainfall data for a
single location defined by an easting and northing coordinate.

Keller et al. (2015). CEH-GEAR: 1 km resolution daily and monthly areal
    rainfall estimates for the UK for hydrological and other applications.
    Earth Syst. Sci. Data, 7, 143-155, 2015. doi:10.5194/essd-7-143-2015
"""

import netCDF4
import numpy as np
import pandas

# URL for THREDDS server which hosts the CEH GEAR dataset
URL_TEMPLATE = "http://thredds-water.ceh.ac.uk/thredds/dodsC/CEHGEARGBDailyRainfallDetail/CEH_GEAR_daily_GB_{year:d}.nc"

# Easting and northing coordinates to access
x = 450900
y = 207200

# Date range to access (inclusive)
start_year = 2010
end_year = 2012

# Convert easting and northing into array indices.
j = x // 1000
i = (1250000 - y) // 1000

# Create a date index to use in the DataFrame later
dates = pandas.date_range('{}-01-01'.format(start_year), '{}-12-31'.format(end_year), freq='D')

# Allocate memory to store rainfall data
rainfall = np.empty([len(dates)], np.float64)

pos = 0
for year in range(start_year, end_year+1):
    # Download timeseries for a single year
    url = URL_TEMPLATE.format(year=year)
    dataset = netCDF4.Dataset(url)
    data = dataset['rainfall_amount'][:,i,j]
    # Copy data into rainfall array
    rainfall[pos:pos+len(data)] = data
    pos += len(data)

assert(pos == len(rainfall))  # if this isn't true something went wrong

# Wrap data up in a DataFrame
df = pandas.DataFrame({
    'Date': dates,
    'Rainfall': rainfall,
})
df.set_index('Date', inplace=True)

# Write output to a CSV
df.to_csv('rainfall.csv')