Plotting CESM Data on an Unstructured Grid using Geoviews and Datashader#

This week, during Xdev office hours, Steve Yeager raised a question regarding plotting unstructured grid data within Python. He was interested in plotting output from the Community Atmosphere Model, which supports unstructured grids, essentially a combination of triangles allowing for higher resolution in certain parts of the domain. This can be adventageous when wanting to achieve the benefits of high resolution within the primary domain, while maintaining the global scale of the model. This week, NCAR Science tweeted a great explanation of how revolutionary this capability is in the context of resolving processes over Greenland.

Unstructured grids can be difficult to plot directly within Python since they do not follow the typical lat, lon (x, y) convention. There is some preprocessing that needs to be applied before plotting.

The two main points include:

  • Calculating the vertices

  • Triangulating the vertices

Fortunately, SciPy and Datashader provide utilities for this! We can also use GeoViews/Holoviews to interactively look at the data.

Imports#

The main packages we will use here (outside of the “core” scientific packages), include:

Datashader#

From the datashader documention:

“Datashader is a graphics pipeline system for creating meaningful representations of large datasets quickly and flexibly. Datashader breaks the creation of images into a series of explicit steps that allow computations to be done on intermediate representations. This approach allows accurate and effective visualizations to be produced automatically without trial-and-error parameter tuning, and also makes it simple for data scientists to focus on particular data and relationships of interest in a principled way.”

Holoviews#

From the holoviews documentation:

“HoloViews is an open-source Python library designed to make data analysis and visualization seamless and simple. With HoloViews, you can usually express what you want to do in very few lines of code, letting you focus on what you are trying to explore and convey, not on the process of plotting.”

Geoviews#

From the geoviews documentation

*”GeoViews is a Python library that makes it easy to explore and visualize geographical, meteorological, and oceanographic datasets, such as those used in weather, climate, and remote sensing research.

GeoViews is built on the HoloViews library for building flexible visualizations of multidimensional data. GeoViews adds a family of geographic plot types based on the Cartopy library, plotted using either the Matplotlib or Bokeh packages. With GeoViews, you can now work easily and naturally with large, multidimensional geographic datasets, instantly visualizing any subset or combination of them, while always being able to access the raw data underlying any plot”*

You can install these packages by using the following:

conda install -c conda-forge datashader holoviews geoviews
import cartopy.crs as ccrs
import datashader as ds
import datashader.utils as du
import geoviews as gv
import geoviews.feature as gf
import holoviews as hv
import numpy as np
import pandas as pd
import xarray as xr
from datashader import reductions as rd, transfer_functions as tf
from holoviews.operation.datashader import datashade, rasterize, regrid
from scipy.spatial import Delaunay

hv.extension("bokeh")

Read in the data#

In this case, we are using test data provided by Steve Yeager

data = xr.open_dataset('/glade/scratch/yeager/TREFHT_CAMne120.nc')

Notice within this dataset, both lat and lon have the same dimension, ncol

data
<xarray.Dataset>
Dimensions:    (time: 1, ncol: 777602, nbnd: 2)
Coordinates:
  * time       (time) object 0021-02-01 00:00:00
Dimensions without coordinates: ncol, nbnd
Data variables:
    TREFHT     (time, ncol) float32 ...
    area       (ncol) float64 ...
    lat        (ncol) float64 ...
    lon        (ncol) float64 ...
    time_bnds  (time, nbnd) object 0021-01-01 00:00:00 0021-02-01 00:00:00
Attributes: (12/14)
    np:               4
    ne:               120
    Conventions:      CF-1.0
    source:           CAM
    case:             B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02
    title:            UNSET
    ...               ...
    Version:          $Name$
    revision_Id:      $Id$
    initial_file:     B.E.13.B1850C5.ne120_t12.sehires38.003.cam.i.0021-01-01...
    topography_file:  /home/export/online1/xwan/cesm/inputdata/atm/cam/topo/U...
    NCO:              netCDF Operators version 4.9.5 (Homepage = http://nco.s...
    history:          Sat Aug  7 08:23:11 2021: ncks -d time,0 B.E.13.B1850C5...

We can use the first time here

data = data.isel(time=0)

Setup the traiangulate function, described in the datashader.Trimesh documentation#

We can use the Delaunay triangulation function from scipy.spatial to calculate the vertices from our latitudes and longitudes

def triangulate(vertices, x="Longitude", y="Latitude"):
    """
    Generate a triangular mesh for the given x,y,z vertices, using Delaunay triangulation.
    For large n, typically results in about double the number of triangles as vertices.
    """
    triang = Delaunay(vertices[[x, y]].values)
    print('Given', len(vertices), "vertices, created", len(triang.simplices), 'triangles.')
    return pd.DataFrame(triang.simplices, columns=['v0', 'v1', 'v2'])

Apply this function to our dataset#

We extract our lats, lons, and value (TREFHT aka temp ) from the dataset, and stack them together

%%time
x = ((data.lon - 180) % 360) - 180
y = data.lat
temp = data.TREFHT
pts = np.stack((x, y, temp)).T
CPU times: user 71 ms, sys: 14.9 ms, total: 85.8 ms
Wall time: 85.5 ms

Now that we have our values, we can add them to a pandas.Dataframe which will make it easier to deal with

%%time
verts = pd.DataFrame(pts, columns=['Longitude', 'Latitude', 'temp'])
CPU times: user 341 µs, sys: 0 ns, total: 341 µs
Wall time: 346 µs

Finally, we input these vertices into the triangulate function we created earlier

%%time
tris = triangulate(verts)
Given 777602 vertices, created 1553762 triangles.
CPU times: user 6.49 s, sys: 249 ms, total: 6.74 s
Wall time: 6.74 s

Plot the result using Datashader + Geoviews/Holoviews#

Now that we have both our triangulations and vertices, we can input this into the geoviews.TriMesh function. We add a label for this plot, and a coordinate reference system.

Wrapping that within rasterize will ensure that we do not read in the entire meshgrid at once; but rather, render the image as neccessary.

In the second line, we specify the output options (opts) including:

  • Add a colorbar

  • Use the magma colormap

  • Use an 800 x 400 size

  • Transform to a Robinson projection

After we set those options for the Trimesh object, we use geoviews.features to add in coastlines and borders

The result is an interactive\(^1\) map! Feel free to use the Bokeh toolbar on the right to zoom in, move around, and examine the dataset


\(^1\)Within the static webpage rendered on this blog, you can scroll around the map, but it will not read in additional data required to increase the resolution at finer scales

%%time
trimesh = rasterize(
    gv.TriMesh(
        (tris, hv.Points(verts, vdims='temp')),
        label="Reference height temperature (K)",
        crs=ccrs.PlateCarree(),
    ),
    aggregator=ds.mean('temp'),
)

(trimesh).opts(
    colorbar=True, cmap='magma', width=800, height=400, projection=ccrs.Robinson()
) * gf.coastline * gf.borders
CPU times: user 178 ms, sys: 30 ms, total: 208 ms
Wall time: 206 ms

Conlcusion#

The open-source Python plotting stack, specifically packages such as Datashader and Geoviews, can be quite powerful when plotting unstructured grids. I encourage you to check out the Geoviews Gallery as well as the Trimesh section of the Datashader Documentation! Happy plotting!