Generate missing C-grid coordinates

Climate model output is usually provided with geometric information like the position of the cell center and boundaries. Gridded observational datasets often lack these informations and provide only the position of either the gridcell center or cell boundary.

This makes the calculation of common vector calculus operators like the gradient difficult, and results depend on the particular method used.

xgcm can infer the cell boundary or cell center location depending on the geometry of the gridded source dataset. This enables consistent and easily reproducible calculations across model and observational datasets.

The autogenerate module enables the user to apply the same methods to both model output and observational products, which enables better comparison and a unified workflow using different sources of data.

In this case xgcm can infer the missing coordinates to enable the creation of a grid object. Below we will illustrate how to infer coordinates for several example datasets (nonperiodic and periodic) and show how the resulting dataset can be used to perform common calculations like gradients and distance/area weighted averages on observational data.

Non periodic 1D example: Ocean temperature profile

First let’s import xarray and xgcm

[ ]:
import numpy as np
import xarray as xr
from xgcm import Grid
from xgcm.autogenerate import generate_grid_ds
import matplotlib.pyplot as plt
%matplotlib inline

Below we will create a synthetic temperature profile with decreasing temperature at depth, with unevenly spaced depth intervals (commonly found in hydrographic data).

[ ]:
# create a synthetic ocean temperature profile with uneven spacing in depth
z = np.hstack([np.arange(1,10, 1), np.arange(10,200, 10), np.arange(200,700, 20)])
# Create synthetic temperature profile with maximum temperature gradient at mid depth (e.g. the thermocline)
temp = ((np.cos(np.pi*z/700) + 1) + np.exp(-z/350) / 2) * 10

# Convert to xarray.Dataset
ds = xr.DataArray(temp, dims=['depth'], coords={'depth':z}).to_dataset(name='temperature')

ds.temperature.plot(y='depth', yincrease=False)

Infer the cell boundaries using xgcm.autogenerate

generate_grid_ds can infer the missing cell positions based on the given position (defaults to cell center) and the axis, which is defined by passing a dictionary with the physical axis as key and the dataset dimensions belonging to that axis as values.

[ ]:
# Generate 'full' dataset, which includes additional coordinate `depth_left` and appropriate attributes.
ds_full = generate_grid_ds(ds, {'Z':'depth'})

We see now that a new dimension depth_left was created, with cell boundaries shifted towards the surface

The default behaviour of generate_grid_ds is to extrapolate the grid position to the ‘left’ (e.g. towards the surface for a depth profile), assuming that the spacing in the two cells closest to the boundary (here: the first and second cell) is equal. Particular geometries might require adjustments of the boundary treatment, by specifying e.g. pad=0 to ensure the topmost cell boundary is located at the sea surface.

Finally we can create the xgcm.Grid object like we would from model output (see for example here)

[ ]:
# Create grid object
grid = Grid(ds_full, periodic=False)

Calculate vertical gradient

Now we have all the tools we need to calculate the vertical gradient just like with numerical model output

[ ]:
# Calculate vertical distances located on the cellboundary
ds_full.coords['dzc'] = grid.diff(ds_full.depth, 'Z', boundary='extrapolate')
# Calculate vertical distances located on the cellcenter
ds_full.coords['dzt'] = grid.diff(ds_full.depth_left, 'Z', boundary='extrapolate')
[ ]:
# note that the temperature gradient is located on the `depth_left` dimension,
# but no temperature information is available, to infer the gradient in the topmost grid cell.
# The following will pad with nan towards the surface. Alternatively the values could be padded with
# with a particular value or linearly extrapolated.

dtemp_dz = grid.diff(ds.temperature, 'Z', boundary='fill', fill_value=np.nan) / ds_full.dzc

fig, ax1 = plt.subplots()
ax2 = ax1.twiny()

ds.temperature.plot(ax=ax1, y='depth', color='C0')
ax1.set_xlabel('temperature [deg C]', color='C0')

dtemp_dz.plot(ax=ax2, y='depth_left', color='C1')
ax2.set_xlabel('vertical temperature gradient [deg C / m]', color='C1');

Depth weighted average

Another common operation for many climate datasets is a weighted mean along an unevenly spaced dimension. Using the grid spacing for the tracer cells earlier this becomes trivial.

[ ]:
mean_temp = ds_full.temperature.mean('depth')
mean_temp_weighted = (ds_full.temperature * ds_full.dzt).sum('depth') / ds_full.dzt.sum('depth')


Periodic 2D example

Below we will show how to apply these methods similarly to a global surface wind field, which is periodic in the longitudinal (‘x’) direction.

# hack to make file name work with nbsphinx and binder
import os
fname = '../datasets/'
if not os.path.exists(fname):
    fname = '../' + fname
[ ]:
ds = xr.open_dataset(fname)
ds_full = generate_grid_ds(ds, {'X':'lon', 'Y':'lat'})

As in the depth profile the longitude and latitude values are extended to the left when the defaults are used. However, since the latitude is not periodic we can specify the position to extend to as outer (more details here). This will extend the latitudinal positions both to the left and right, avoiding missing values later on.

[ ]:
grid = Grid(ds_full, periodic=['X'])

Now we compute the difference (in degrees) along the longitude and latitude for both the cell center and the cell face. Since we are not taking the difference of a data variable across the periodic boundary, we need to specify the boundary_discontinutity in order to avoid the introduction of artefacts at the boundary.

[ ]:
dlong = grid.diff(ds_full.lon, 'X', boundary_discontinuity=360)
dlonc = grid.diff(ds_full.lon_left, 'X', boundary_discontinuity=360)
dlonc_wo_discontinuity = grid.diff(ds_full.lon_left, 'X')

dlatg = grid.diff(, 'Y', boundary='fill', fill_value=np.nan)
dlatc = grid.diff(ds_full.lat_left, 'Y', boundary='fill', fill_value=np.nan)


Without adding the boundary_discontinuity input, the last cell distance is calculated incorectly.

The values we just calculated are actually not cell distances, but instead differences in longitude and latitude (in degrees). In order to calculate operators like the gradient dlon and dlat have to be converted into approximate cartesian distances on a globe.

[ ]:
def dll_dist(dlon, dlat, lon, lat):
        """Converts lat/lon differentials into distances in meters

        dlon : xarray.DataArray longitude differentials
        dlat : xarray.DataArray latitude differentials
        lon  : xarray.DataArray longitude values
        lat  : xarray.DataArray latitude values

        dx  : xarray.DataArray distance inferred from dlon
        dy  : xarray.DataArray distance inferred from dlat

        distance_1deg_equator = 111000.0
        dx = dlon * xr.ufuncs.cos(xr.ufuncs.deg2rad(lat)) * distance_1deg_equator
        dy = ((lon * 0) + 1) * dlat * distance_1deg_equator
        return dx, dy

ds_full.coords['dxg'], ds_full.coords['dyg'] = dll_dist(dlong, dlatg, ds_full.lon,
ds_full.coords['dxc'], ds_full.coords['dyc'] = dll_dist(dlonc, dlatc, ds_full.lon,

Based on the distances we can estimate the area of each grid cell and compute the area-weighted meridional average temperature

[ ]:
ds_full.coords['area_c'] = ds_full.dxc * ds_full.dyc

Compute zonal gradient of the surface wind field

Now that all needed grid metrics are available, we can compute the zonal temperature gradient similar as above.

[ ]:
du_dx = grid.diff(ds_full.uwnd, 'X') / ds_full.dxg

The values of the gradient are correctly located on the cell boundary on the x-axis and on the cell center in the y-axis

[ ]:
import as ccrs

fig, axarr = plt.subplots(ncols=2, nrows=3, figsize=[16,20],
                               subplot_kw=dict(projection=ccrs.Orthographic(0, -30)))

for ti,tt in enumerate(np.arange(0,30, 10)):
    ax1 = axarr[ti,0]
    ax2 = axarr[ti,1]
    time = ds_full.time.isel(time=tt).data
    ds_full.uwnd.isel(time=tt).plot(ax=ax1, transform=ccrs.PlateCarree(),robust=True)

    du_dx.isel(time=tt).plot(ax=ax2, transform=ccrs.PlateCarree(), robust=True)

    ax1.set_title('Zonal Surface Wind %s' %time)
    ax2.set_title('Zonal Gradient of Zonal Surface Wind %s' %time)

    for ax in [ax1, ax2]:
        ax.set_global(); ax.coastlines();

The resulting gradient is correctly computed across the periodic (x-axis) boundary.

Area weighted average

By using the approximated cell area, we can easily compute the area weighted average of the zonal wind.

[ ]:
u = ds_full.uwnd.mean('time')

u_mean = u.mean('lat')
u_mean_weighted = (u * u.area_c).sum('lat') / (u.area_c).sum('lat')

u_mean.plot(label='unweighted mean')
u_mean_weighted.plot(label='area weighted mean')
plt.ylabel('zonal wind [m/s]')

Thes two lines show the importance of applying a weighted mean, when the grid spacing is irregular (e.g. datasets gridded on a regular lat-lon grid).