Examining Diagnostics Using Intake-ESM and hvPlot#
In previous weeks, we have looked at building Intake-ESM catalogs from history files and visualizing CESM output using Holoviews and Datashader, but this week we are putting together a few of those pieces to visualize a comparison between model runs.
One of the first ESDS blog posts looked at building an interactive dashboard to look at plots, using high resolution ocean model output as the dataset. One of the limitations of that approach is that the images are static - we are pulling in png
s, and rending on the page, as opposed to more interactive options. In this example, we will read in data generated from ecgtools, from a directory only accessible via NCAR’s high performance computing center.
The main plotting library used here is hvPlot, which is a “A high-level plotting API for the PyData ecosystem built on HoloViews”. Essentially, it is built on top of key components of the Python visualization stack, such as Holoviews, Datashader, and Bokeh, with a seamless integration with Xarray (check out the
Imports#
import ast
import holoviews as hv
import hvplot
import hvplot.xarray
import intake
import matplotlib.pyplot as plt
import panel as pn
import xarray as xr
from distributed import Client
from ncar_jobqueue import NCARCluster
hv.extension('bokeh')
pn.extension()
Spin up a Dask Cluster#
We spin up a cluster, incuding 20 total workers
cluster = NCARCluster()
cluster.scale(20)
client = Client(cluster)
cluster
Read in the data#
As mentioned previously, we are reading in data from within the GLADE filesystem at NCAR. The special line
ast.literal_eval
is included since there are multiple variables in the variables column, an artifact of using the default “history” file output (check out our previous post here for more information about this type of model output)
catalog = intake.open_esm_datastore(
"/glade/work/mgrover/cesm-validation-catalog.json",
csv_kwargs={"converters": {"variables": ast.literal_eval}},
sep="/",
)
catalog
None catalog with 12 dataset(s) from 11103 asset(s):
unique | |
---|---|
component | 1 |
stream | 4 |
date | 2501 |
case | 3 |
member_id | 2 |
frequency | 4 |
variables | 545 |
path | 11103 |
We see that there are three cases included in this catalog
catalog.df.case.unique()
array(['b1850.f19_g17.validation_nuopc.004',
'b1850.f19_g17.validation_mct.004',
'b1850.f19_g17.validation_mct.002'], dtype=object)
Subset for monthly output from the last 20 years#
We start by subsetting for monthly output
catalog_subset = catalog.search(frequency='month_1')
Next, we subset for the last 20 years of data. Since this is monthly output, the last twenty years corresponds to
12 months * 20 years = 240
dates = sorted(catalog_subset.df.date.unique())[-240:]
print(dates[:12], dates[-12:])
['0081-01', '0081-02', '0081-03', '0081-04', '0081-05', '0081-06', '0081-07', '0081-08', '0081-09', '0081-10', '0081-11', '0081-12'] ['0100-01', '0100-02', '0100-03', '0100-04', '0100-05', '0100-06', '0100-07', '0100-08', '0100-09', '0100-10', '0100-11', '0100-12']
catalog_subset = catalog_subset.search(frequency='month_1', date=dates)
Now, our catlaog includes our 3 cases with monthly data from the last 20 years of the simulation
catalog_subset
None catalog with 3 dataset(s) from 720 asset(s):
unique | |
---|---|
component | 1 |
stream | 1 |
date | 240 |
case | 3 |
member_id | 2 |
frequency | 1 |
variables | 434 |
path | 720 |
Read in a dictionary of datasets#
Using the .to_dataset_dict()
method, we read in the datasets
dsets = catalog_subset.to_dataset_dict(cdf_kwargs={'use_cftime': True, 'chunks': {'time': 60}})
--> The keys in the returned dictionary of datasets are constructed as follows:
'component/stream/case'
Taking a look at the keys, we see that the follow the convention component/stream/casename
dsets.keys()
dict_keys(['ocn/pop.h/b1850.f19_g17.validation_mct.002', 'ocn/pop.h/b1850.f19_g17.validation_mct.004', 'ocn/pop.h/b1850.f19_g17.validation_nuopc.004'])
Compute the Temporal Average#
We are interested in the average over the last 20 years, so we take the mean with respect to time. For now, we are interested in two variables:
SALT (salinity)
TEMP (potential temperature)
variables = ['TEMP', 'SALT']
We also want to make sure that when we operating on the datasets, that we hold onto the attributes. We can do this using:
xr.set_options(keep_attrs=True)
<xarray.core.options.set_options at 0x2ba103a57eb0>
ds_list = []
for key in dsets.keys():
# Extract the dataset from the dictionary of datasets
ds = dsets[key]
# Compute the mean over time
mean = ds.mean(dim='time')
# Subset for the variables we are interested in
out = mean[variables]
# Modify which variables we include in the attributes
out.attrs['intake_esm_varname'] = variables
# Append to the list of datasets
ds_list.append(out)
Concatenate the datasets from the various cases
merged_ds = xr.concat(ds_list, dim='case')
Each dataset has an attribute title
which contains the casename, which we can use as one of the main dimensions or our new merged dataset
merged_ds.title
'b1850.f19_g17.validation_mct.002'
cases = []
for dset in ds_list:
cases.append(dset.title)
merged_ds['case'] = cases
Load in our computation#
Before we start plotting, we can go ahead and call .persist()
on our dataset which will load our temporally averaged, dataset with a subset of original variables
merged_ds.persist()
<xarray.Dataset> Dimensions: (case: 3, z_t: 60, nlat: 384, nlon: 320) Coordinates: * z_t (z_t) float32 500.0 1.5e+03 2.5e+03 ... 5.125e+05 5.375e+05 ULONG (nlat, nlon) float64 321.1 322.3 323.4 324.5 ... 319.2 319.6 320.0 ULAT (nlat, nlon) float64 -78.95 -78.95 -78.95 ... 72.42 72.41 72.41 TLONG (nlat, nlon) float64 320.6 321.7 322.8 323.9 ... 318.9 319.4 319.8 TLAT (nlat, nlon) float64 -79.22 -79.22 -79.22 ... 72.2 72.19 72.19 * case (case) <U34 'b1850.f19_g17.validation_mct.002' ... 'b1850.f19_g1... Dimensions without coordinates: nlat, nlon Data variables: TEMP (case, z_t, nlat, nlon) float32 dask.array<chunksize=(1, 60, 384, 320), meta=np.ndarray> SALT (case, z_t, nlat, nlon) float32 dask.array<chunksize=(1, 60, 384, 320), meta=np.ndarray> Attributes: revision: $Id$ title: b1850.f19_g17.validation_mct.002 contents: Diagnostic and Prognostic Variables calendar: All years have exactly 365 days. Conventions: CF-1.0; http://www.cgd.ucar.edu/cms/eaton/netcdf... history: none time_period_freq: month_1 model_doi_url: https://doi.org/10.5065/D67H1H0V cell_methods: cell_methods = time: mean ==> the variable value... source: CCSM POP2, the CCSM Ocean Component intake_esm_varname: ['TEMP', 'SALT'] intake_esm_dataset_key: ocn/pop.h/b1850.f19_g17.validation_mct.002
- case: 3
- z_t: 60
- nlat: 384
- nlon: 320
- z_t(z_t)float32500.0 1.5e+03 ... 5.375e+05
- long_name :
- depth from surface to midpoint of layer
- units :
- centimeters
- positive :
- down
- valid_min :
- 500.0
- valid_max :
- 537500.0
array([5.000000e+02, 1.500000e+03, 2.500000e+03, 3.500000e+03, 4.500000e+03, 5.500000e+03, 6.500000e+03, 7.500000e+03, 8.500000e+03, 9.500000e+03, 1.050000e+04, 1.150000e+04, 1.250000e+04, 1.350000e+04, 1.450000e+04, 1.550000e+04, 1.650984e+04, 1.754790e+04, 1.862913e+04, 1.976603e+04, 2.097114e+04, 2.225783e+04, 2.364088e+04, 2.513702e+04, 2.676542e+04, 2.854837e+04, 3.051192e+04, 3.268680e+04, 3.510935e+04, 3.782276e+04, 4.087846e+04, 4.433777e+04, 4.827367e+04, 5.277280e+04, 5.793729e+04, 6.388626e+04, 7.075633e+04, 7.870025e+04, 8.788252e+04, 9.847059e+04, 1.106204e+05, 1.244567e+05, 1.400497e+05, 1.573946e+05, 1.764003e+05, 1.968944e+05, 2.186457e+05, 2.413972e+05, 2.649001e+05, 2.889385e+05, 3.133405e+05, 3.379793e+05, 3.627670e+05, 3.876452e+05, 4.125768e+05, 4.375392e+05, 4.625190e+05, 4.875083e+05, 5.125028e+05, 5.375000e+05], dtype=float32)
- ULONG(nlat, nlon)float64321.1 322.3 323.4 ... 319.6 320.0
- long_name :
- array of u-grid longitudes
- units :
- degrees_east
array([[321.12500894, 322.25000897, 323.375009 , ..., 317.75000884, 318.87500887, 320.0000089 ], [321.12500894, 322.25000897, 323.375009 , ..., 317.75000884, 318.87500887, 320.0000089 ], [321.12500894, 322.25000897, 323.375009 , ..., 317.75000884, 318.87500887, 320.0000089 ], ..., [320.48637802, 320.97240884, 321.4577638 , ..., 319.02760897, 319.51363979, 320.00001324], [320.45160767, 320.90286181, 321.35342745, ..., 319.097156 , 319.54841014, 320.00001293], [320.41397858, 320.82760085, 321.24052915, ..., 319.17241696, 319.58603923, 320.00001259]])
- ULAT(nlat, nlon)float64-78.95 -78.95 ... 72.41 72.41
- long_name :
- array of u-grid latitudes
- units :
- degrees_north
array([[-78.95289509, -78.95289509, -78.95289509, ..., -78.95289509, -78.95289509, -78.95289509], [-78.41865507, -78.41865507, -78.41865507, ..., -78.41865507, -78.41865507, -78.41865507], [-77.88441506, -77.88441506, -77.88441506, ..., -77.88441506, -77.88441506, -77.88441506], ..., [ 71.51215224, 71.51766482, 71.52684191, ..., 71.51766482, 71.51215224, 71.51031365], [ 71.95983548, 71.96504258, 71.97371054, ..., 71.96504258, 71.95983548, 71.95809872], [ 72.4135549 , 72.41841155, 72.42649554, ..., 72.41841155, 72.4135549 , 72.41193498]])
- TLONG(nlat, nlon)float64320.6 321.7 322.8 ... 319.4 319.8
- long_name :
- array of t-grid longitudes
- units :
- degrees_east
array([[320.56250892, 321.68750895, 322.81250898, ..., 317.18750883, 318.31250886, 319.43750889], [320.56250892, 321.68750895, 322.81250898, ..., 317.18750883, 318.31250886, 319.43750889], [320.56250892, 321.68750895, 322.81250898, ..., 317.18750883, 318.31250886, 319.43750889], ..., [320.25133086, 320.75380113, 321.25577325, ..., 318.74424456, 319.24621668, 319.74869143], [320.23459477, 320.70358949, 321.17207442, ..., 318.82794339, 319.29642832, 319.76542721], [320.21650899, 320.6493303 , 321.08163473, ..., 318.91838308, 319.3506875 , 319.78351267]])
- TLAT(nlat, nlon)float64-79.22 -79.22 ... 72.19 72.19
- long_name :
- array of t-grid latitudes
- units :
- degrees_north
array([[-79.22052261, -79.22052261, -79.22052261, ..., -79.22052261, -79.22052261, -79.22052261], [-78.68630626, -78.68630626, -78.68630626, ..., -78.68630626, -78.68630626, -78.68630626], [-78.15208992, -78.15208992, -78.15208992, ..., -78.15208992, -78.15208992, -78.15208992], ..., [ 71.29031715, 71.29408252, 71.30160692, ..., 71.30160692, 71.29408252, 71.29031716], [ 71.73524335, 71.73881845, 71.74596231, ..., 71.74596231, 71.73881845, 71.73524335], [ 72.18597561, 72.18933231, 72.19603941, ..., 72.19603941, 72.18933231, 72.18597562]])
- case(case)<U34'b1850.f19_g17.validation_mct.00...
array(['b1850.f19_g17.validation_mct.002', 'b1850.f19_g17.validation_mct.004', 'b1850.f19_g17.validation_nuopc.004'], dtype='<U34')
- TEMP(case, z_t, nlat, nlon)float32dask.array<chunksize=(1, 60, 384, 320), meta=np.ndarray>
- long_name :
- Potential Temperature
- units :
- degC
- grid_loc :
- 3111
- cell_methods :
- time: mean
Array Chunk Bytes 84.38 MiB 28.12 MiB Shape (3, 60, 384, 320) (1, 60, 384, 320) Count 3 Tasks 3 Chunks Type float32 numpy.ndarray - SALT(case, z_t, nlat, nlon)float32dask.array<chunksize=(1, 60, 384, 320), meta=np.ndarray>
- long_name :
- Salinity
- units :
- gram/kilogram
- grid_loc :
- 3111
- cell_methods :
- time: mean
Array Chunk Bytes 84.38 MiB 28.12 MiB Shape (3, 60, 384, 320) (1, 60, 384, 320) Count 3 Tasks 3 Chunks Type float32 numpy.ndarray
- revision :
- $Id$
- title :
- b1850.f19_g17.validation_mct.002
- contents :
- Diagnostic and Prognostic Variables
- calendar :
- All years have exactly 365 days.
- Conventions :
- CF-1.0; http://www.cgd.ucar.edu/cms/eaton/netcdf/CF-current.htm
- history :
- none
- time_period_freq :
- month_1
- model_doi_url :
- https://doi.org/10.5065/D67H1H0V
- cell_methods :
- cell_methods = time: mean ==> the variable values are averaged over the time interval between the previous time coordinate and the current one. cell_methods absent ==> the variable values are at the time given by the current time coordinate.
- source :
- CCSM POP2, the CCSM Ocean Component
- intake_esm_varname :
- ['TEMP', 'SALT']
- intake_esm_dataset_key :
- ocn/pop.h/b1850.f19_g17.validation_mct.002
Plot a Comparison using Holoviews#
We can setup a couple of helper functions before visualizing our output. The first helps with an issue that might not seem clear when first trying to plot POP ocean model output with Holoviews.
When first checking to see what TEMP
’s dimensions are, we see they are z_t (vertical) and nlat/nlon (horizontal)
ds.TEMP.dims
('time', 'z_t', 'nlat', 'nlon')
When Holoviews sees this, we do not see the same dimensions…
first_case = merged_ds.isel(case=0)
first_case.TEMP.hvplot.quadmesh()
One way to fix this is by ensuring that nlat
and nlon
are in the dataset as variables
merged_ds['nlat'] = merged_ds.nlat
merged_ds['nlon'] = merged_ds.nlon
Now, the dimensions are assembled properly and we can proceed with plotting!
Note of Warning: when this plot renders on a webpage, you will note be able to use the slider to view other vertical levels.
You can still pan and zoom on the plot
first_case = merged_ds.isel(case=0)
first_case.TEMP.hvplot.quadmesh()
Let’s add some more arguments and options to make the plot look a bit nicer
first_plot = (
merged_ds.isel(case=0)
.TEMP.hvplot.quadmesh(rasterize=True)
.opts(colorbar=True, width=500, height=400, cmap='magma', tools=['hover'])
)
first_plot
We can add to the title of the plot using the following:
first_plot.relabel(f'{merged_ds.isel(case=0).title}')
We can also place plots side by side using the +
second_case = hvds = merged_ds.isel(case=1)
second_plot = second_case.TEMP.hvplot.quadmesh(rasterize=True).opts(
colorbar=True, width=500, height=400, cmap='magma', tools=['hover']
)
first_plot + second_plot
We can stackplots on top of one another by using hv.Layout
and specifying we want a single column
hv.Layout([first_plot, second_plot]).cols(1)
Creating helper functions for plotting#
We can create a helper function for working with plotting! As seen below
def plot_interactive(ds, variable, cmap='magma'):
"""Plots an interactive plot using holoviews"""
# Make sure that nlat and nlon are in the dataarray
da = ds[variable]
da['nlon'] = ds.nlon
da['nlat'] = ds.nlat
quadmesh = da.hvplot.quadmesh(rasterize=True).opts(
colorbar=True, width=500, height=400, cmap=cmap, tools=['hover']
)
return quadmesh.relabel(f'{ds.title}')
plot_interactive(first_case, variable='TEMP')
Another function that would be useful is plotting comparisons between cases. As mentioned previously, we have three cases. We set the baseline to be b1850.f19_g17.validation_mct.004
!
merged_ds.case
<xarray.DataArray 'case' (case: 3)> array(['b1850.f19_g17.validation_mct.002', 'b1850.f19_g17.validation_mct.004', 'b1850.f19_g17.validation_nuopc.004'], dtype='<U34') Coordinates: * case (case) <U34 'b1850.f19_g17.validation_mct.002' ... 'b1850.f19_g1...
- case: 3
- 'b1850.f19_g17.validation_mct.002' ... 'b1850.f19_g17.validation_nu...
array(['b1850.f19_g17.validation_mct.002', 'b1850.f19_g17.validation_mct.004', 'b1850.f19_g17.validation_nuopc.004'], dtype='<U34')
- case(case)<U34'b1850.f19_g17.validation_mct.00...
array(['b1850.f19_g17.validation_mct.002', 'b1850.f19_g17.validation_mct.004', 'b1850.f19_g17.validation_nuopc.004'], dtype='<U34')
def plot_comparison(ds, case_to_compare, variable, baseline='b1850.f19_g17.validation_mct.004'):
"""Plot a comparison between cases"""
# Baseline Case
baseline_ds = ds.sel(case=baseline)
baseline_ds.attrs['title'] = baseline
baseline_plot = plot_interactive(baseline_ds, variable)
# Comparison Case
case_compare_ds = ds.sel(case=case_to_compare)
case_compare_ds.attrs['title'] = case_to_compare
case_compare_plot = plot_interactive(case_compare_ds, variable)
# Absolute difference in native units
diff = case_compare_ds[[variable]] - baseline_ds[[variable]]
diff.attrs['title'] = f'{case_to_compare} - {baseline}'
diff[variable].attrs['units'] = ds[variable].units
difference_plot = plot_interactive(diff, variable, cmap='seismic')
# Difference in percent
diff_percent = (diff / case_compare_ds[[variable]].max()) * 100
diff_percent.attrs['title'] = f'Percent Difference'
diff_percent[variable].attrs['units'] = 'Percent'
difference_plot_percent = plot_interactive(diff_percent, variable, cmap='seismic')
# Use holoviews layout to setup the plots with two columns
plots = hv.Layout(
[case_compare_plot + baseline_plot + difference_plot + difference_plot_percent]
).cols(2)
# Add a title with the associated name and units
panel = pn.Column(
pn.pane.Markdown(f"## {ds[variable].long_name} {ds[variable].units}", align="center"), plots
)
return panel
Plot our final comparisons#
Let’s take a look at our two variables:
TEMP
SALT
plot_comparison(merged_ds, case_to_compare='b1850.f19_g17.validation_nuopc.004', variable='TEMP')
plot_comparison(merged_ds, case_to_compare='b1850.f19_g17.validation_nuopc.004', variable='SALT')
Conclusion#
As you can see, using this framework can be quite powerful; for this analysis, we were able to read data directly from model output, without converting to timeseries files, plotting diagnostics for a few ocean fields. While there are still opportunties for improvement in terms of additional tools to make this easier to implement into a robust diagnostics framework, it is exciting to see many of the key pieces of extensible diagnostics are already available!
Something to keep in mind is the fact that when these are rendered on the web, the individual maps are interactive, but the sliders are not able to render new data onto the plot. There is a detailed discussion regarding this topic on the GeoCAT viz issue related to plotting MPAS data. We should think about ways to enable more interactivity with these analysis notebooks and frameworks!
If you are interested in other plots from this comparison, check out this JupyterBook which includes additional variables!