Regridding High Resolution Observations to a High Resolution Model Grid#

In this example, we will cover how to leverage a useful package from the Pangeo Ecosystem, xESMF. One important note when using this package, is make sure you are using the most up-to-date documentation/version, a few years ago, development moved to the pangeo-data branch of the package, installable using the following:

conda install -c conda-forge xESMF

Imports#

import ast

import cf_xarray
import cftime
import geocat.comp
import holoviews as hv
import hvplot
import hvplot.xarray
import intake
import numpy as np
import pop_tools
import xarray as xr
import xesmf as xe
from distributed import Client
from ncar_jobqueue import NCARCluster
from pop_tools.grid import _compute_corners

hv.extension('bokeh')

cluster = NCARCluster(memory='20 GB')
cluster.scale(20)
client = Client(cluster)
client

Client

Client-43aadd5b-1240-11ec-9327-3cecef1b11de

Connection method: Cluster object	Cluster type: dask_jobqueue.PBSCluster
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/8787/status

Cluster Info

PBSCluster

a49fcaa9

Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/8787/status	Workers: 0
Total threads: 0	Total memory: 0 B

Scheduler Info

Scheduler

Scheduler-d92edee1-f368-4baf-9801-ff91a4a71464

Comm: tcp://10.12.206.51:34462	Workers: 0
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/8787/status	Total threads: 0
Started: Just now	Total memory: 0 B

Workers

Read in Model Data#

In this example, we are using high resolution ocean model output from CESM. This data is stored on the GLADE filesystem on NCAR HPC resources. Here is a repository describing how we generated a catalog similar to this, with some examples of plotting the data

data_catalog = intake.open_esm_datastore(
    "/glade/work/mgrover/hires-fulljra-catalog.json",
    csv_kwargs={"converters": {"variables": ast.literal_eval}},
    sep="/",
)
data_catalog

None catalog with 4 dataset(s) from 2765 asset(s):

	unique
component	1
stream	2
date	2694
case	2
member_id	2
frequency	3
variables	97
path	2765

Let’s subset for temperature (TEMP) here

data_catalog_subset = data_catalog.search(variables='TEMP')

When reading in the data, we are chunking by horizontal, and vertical dimensions, since we will be taking a temporal average in future steps. Chunking by dimensions not being averaged over helps reduced the amount of data on disk that is moved around during computation.

dsets = data_catalog_subset.to_dataset_dict(
    cdf_kwargs={'chunks': {'nlat': 800, 'nlon': 900, 'z_t': 4}}
)

--> The keys in the returned dictionary of datasets are constructed as follows:
	'component/stream/case'

100.00% [2/2 01:40<00:00]

dsets.keys()

dict_keys(['ocn/pop.h/g.e20.G.TL319_t13.control.001', 'ocn/pop.h/g.e20.G.TL319_t13.control.001_hfreq'])

Subset for time - here, we are interested in year 24 through 53#

The model begins at 1958, so this roughly corresponds to 1981-2010, which is when the climatology from WOA was calculated

We choose February from year 24 through January of year 53 since the time represents the end of the time bounds

start_time = cftime.datetime(24, 2, 1, 0, 0, 0, 0, calendar='noleap', has_year_zero=True)
end_time = cftime.datetime(53, 1, 1, 0, 0, 0, 0, calendar='noleap', has_year_zero=True)

dset_list = []

for key in dsets.keys():
    dset_list.append(dsets[key].sel(time=slice(start_time, end_time)))

model_ds = xr.concat(
    dset_list, dim='time', data_vars="minimal", coords="minimal", compat="override"
)

It looks like the model does not include the entire time range (up to ~2005), but it is pretty close!

model_ds

<xarray.Dataset>
Dimensions:  (time: 1764, z_t: 62, nlat: 2400, nlon: 3600)
Coordinates:
  * time     (time) object 0024-02-01 00:00:00 ... 0049-11-02 00:00:00
  * z_t      (z_t) float32 500.0 1.5e+03 2.5e+03 ... 5.625e+05 5.875e+05
    ULONG    (nlat, nlon) float64 dask.array<chunksize=(800, 900), meta=np.ndarray>
    ULAT     (nlat, nlon) float64 dask.array<chunksize=(800, 900), meta=np.ndarray>
    TLONG    (nlat, nlon) float64 dask.array<chunksize=(800, 900), meta=np.ndarray>
    TLAT     (nlat, nlon) float64 dask.array<chunksize=(800, 900), meta=np.ndarray>
Dimensions without coordinates: nlat, nlon
Data variables:
    TEMP     (time, z_t, nlat, nlon) float32 dask.array<chunksize=(1, 4, 800, 900), meta=np.ndarray>
Attributes:
    cell_methods:            cell_methods = time: mean ==> the variable value...
    time_period_freq:        month_1
    contents:                Diagnostic and Prognostic Variables
    calendar:                All years have exactly  365 days.
    source:                  CCSM POP2, the CCSM Ocean Component
    history:                 none
    model_doi_url:           https://doi.org/10.5065/D67H1H0V
    Conventions:             CF-1.0; http://www.cgd.ucar.edu/cms/eaton/netcdf...
    intake_esm_varname:      ['TEMP']
    revision:                $Id: tavg.F90 89091 2018-04-30 15:58:32Z altunta...
    title:                   g.e20.G.TL319_t13.control.001
    intake_esm_dataset_key:  ocn/pop.h/g.e20.G.TL319_t13.control.001

xarray.Dataset

Dimensions:
- time: 1764
- z_t: 62
- nlat: 2400
- nlon: 3600

Coordinates: (6)

time

(time)

object

0024-02-01 00:00:00 ... 0049-11-...

long_name :: time
bounds :: time_bound

array([cftime.datetime(24, 2, 1, 0, 0, 0, 0, calendar='noleap', has_year_zero=True),
       cftime.datetime(24, 3, 1, 0, 0, 0, 0, calendar='noleap', has_year_zero=True),
       cftime.datetime(24, 4, 1, 0, 0, 0, 0, calendar='noleap', has_year_zero=True),
       ...,
       cftime.datetime(49, 10, 23, 0, 0, 0, 0, calendar='noleap', has_year_zero=True),
       cftime.datetime(49, 10, 28, 0, 0, 0, 0, calendar='noleap', has_year_zero=True),
       cftime.datetime(49, 11, 2, 0, 0, 0, 0, calendar='noleap', has_year_zero=True)],
      dtype=object)

z_t

(z_t)

float32

500.0 1.5e+03 ... 5.875e+05

long_name :: depth from surface to midpoint of layer
units :: centimeters
positive :: down
valid_min :: 500.0
valid_max :: 587499.06

array([5.000000e+02, 1.500000e+03, 2.500000e+03, 3.500000e+03, 4.500000e+03,
       5.500000e+03, 6.500000e+03, 7.500000e+03, 8.500000e+03, 9.500000e+03,
       1.050000e+04, 1.150000e+04, 1.250000e+04, 1.350000e+04, 1.450000e+04,
       1.550000e+04, 1.650984e+04, 1.754790e+04, 1.862913e+04, 1.976603e+04,
       2.097114e+04, 2.225783e+04, 2.364088e+04, 2.513702e+04, 2.676542e+04,
       2.854837e+04, 3.051192e+04, 3.268680e+04, 3.510935e+04, 3.782276e+04,
       4.087846e+04, 4.433777e+04, 4.827367e+04, 5.277280e+04, 5.793729e+04,
       6.388626e+04, 7.075633e+04, 7.870025e+04, 8.788252e+04, 9.847059e+04,
       1.106204e+05, 1.244567e+05, 1.400497e+05, 1.573946e+05, 1.764003e+05,
       1.968944e+05, 2.186457e+05, 2.413972e+05, 2.649001e+05, 2.889385e+05,
       3.133405e+05, 3.379793e+05, 3.627670e+05, 3.876452e+05, 4.125768e+05,
       4.375392e+05, 4.625190e+05, 4.875083e+05, 5.125028e+05, 5.375000e+05,
       5.624991e+05, 5.874991e+05], dtype=float32)

ULONG

(nlat, nlon)

float64

dask.array<chunksize=(800, 900), meta=np.ndarray>

long_name :: array of u-grid longitudes
units :: degrees_east

	Array	Chunk
Bytes	65.92 MiB	5.49 MiB
Shape	(2400, 3600)	(800, 900)
Count	13 Tasks	12 Chunks
Type	float64	numpy.ndarray

ULAT

(nlat, nlon)

float64

dask.array<chunksize=(800, 900), meta=np.ndarray>

long_name :: array of u-grid latitudes
units :: degrees_north

	Array	Chunk
Bytes	65.92 MiB	5.49 MiB
Shape	(2400, 3600)	(800, 900)
Count	13 Tasks	12 Chunks
Type	float64	numpy.ndarray

TLONG

(nlat, nlon)

float64

dask.array<chunksize=(800, 900), meta=np.ndarray>

long_name :: array of t-grid longitudes
units :: degrees_east

	Array	Chunk
Bytes	65.92 MiB	5.49 MiB
Shape	(2400, 3600)	(800, 900)
Count	13 Tasks	12 Chunks
Type	float64	numpy.ndarray

TLAT

(nlat, nlon)

float64

dask.array<chunksize=(800, 900), meta=np.ndarray>

long_name :: array of t-grid latitudes
units :: degrees_north

	Array	Chunk
Bytes	65.92 MiB	5.49 MiB
Shape	(2400, 3600)	(800, 900)
Count	13 Tasks	12 Chunks
Type	float64	numpy.ndarray

Data variables: (1)

TEMP

(time, z_t, nlat, nlon)

float32

dask.array<chunksize=(1, 4, 800, 900), meta=np.ndarray>

long_name :: Potential Temperature
units :: degC
grid_loc :: 3111
cell_methods :: time: mean

	Array	Chunk
Bytes	3.44 TiB	10.99 MiB
Shape	(1764, 62, 2400, 3600)	(1, 4, 800, 900)
Count	1128696 Tasks	338688 Chunks
Type	float32	numpy.ndarray

Attributes: (12)
cell_methods :
cell_methods = time: mean ==> the variable values are averaged over the time interval between the previous time coordinate and the current one. cell_methods absent ==> the variable values are at the time given by the current time coordinate.
time_period_freq :
month_1
contents :
Diagnostic and Prognostic Variables
calendar :
All years have exactly 365 days.
source :
CCSM POP2, the CCSM Ocean Component
history :
none
model_doi_url :
https://doi.org/10.5065/D67H1H0V
Conventions :
CF-1.0; http://www.cgd.ucar.edu/cms/eaton/netcdf/CF-current.htm
intake_esm_varname :
['TEMP']
revision :
$Id: tavg.F90 89091 2018-04-30 15:58:32Z altuntas@ucar.edu $
title :
g.e20.G.TL319_t13.control.001
intake_esm_dataset_key :
ocn/pop.h/g.e20.G.TL319_t13.control.001

Compute the Monthly Mean using Geocat-Comp#

We are only interested in TEMP here, so we can subset for that when computing the climatology

model_monthly_mean = geocat.comp.climatology(model_ds[['TEMP']], 'month')

Regrid and Interpolate Vertical Levels#

As mentioned before, our observational dataset already has the bounds information, which is important for determing grid corners during the regridding process. For the POP dataset, we need to add a helper function to add this information

def gen_corner_calc(ds, cell_corner_lat='ULAT', cell_corner_lon='ULONG'):
    """
    Generates corner information and creates single dataset with output
    """

    cell_corner_lat = ds[cell_corner_lat]
    cell_corner_lon = ds[cell_corner_lon]
    # Use the function in pop-tools to get the grid corner information
    corn_lat, corn_lon = _compute_corners(cell_corner_lat, cell_corner_lon)

    # Make sure this returns four corner points
    assert corn_lon.shape[-1] == 4

    lon_shape, lat_shape = corn_lon[:, :, 0].shape
    out_shape = (lon_shape + 1, lat_shape + 1)

    # Generate numpy arrays to store destination lats/lons
    out_lons = np.zeros(out_shape)
    out_lats = np.zeros(out_shape)

    # Assign the northeast corner information
    out_lons[1:, 1:] = corn_lon[:, :, 0]
    out_lats[1:, 1:] = corn_lat[:, :, 0]

    # Assign the northwest corner information
    out_lons[1:, :-1] = corn_lon[:, :, 1]
    out_lats[1:, :-1] = corn_lat[:, :, 1]

    # Assign the southwest corner information
    out_lons[:-1, :-1] = corn_lon[:, :, 2]
    out_lats[:-1, :-1] = corn_lat[:, :, 2]

    # Assign the southeast corner information
    out_lons[:-1, 1:] = corn_lon[:, :, 3]
    out_lats[:-1, 1:] = corn_lat[:, :, 3]

    return out_lats, out_lons

lat_corners, lon_corners = gen_corner_calc(model_monthly_mean)

We can also rename our latitude and longitude names

model_monthly_mean = model_monthly_mean.rename({'TLAT': 'lat', 'TLONG': 'lon'}).drop(
    ['ULAT', 'ULONG']
)

model_monthly_mean['lon_b'] = (('nlat_b', 'nlon_b'), lon_corners)
model_monthly_mean['lat_b'] = (('nlat_b', 'nlon_b'), lat_corners)
model_monthly_mean

<xarray.Dataset>
Dimensions:  (month: 12, z_t: 62, nlat: 2400, nlon: 3600, nlat_b: 2401, nlon_b: 3601)
Coordinates:
  * month    (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
  * z_t      (z_t) float64 500.0 1.5e+03 2.5e+03 ... 5.625e+05 5.875e+05
    lon      (nlat, nlon) float64 dask.array<chunksize=(800, 900), meta=np.ndarray>
    lat      (nlat, nlon) float64 dask.array<chunksize=(800, 900), meta=np.ndarray>
Dimensions without coordinates: nlat, nlon, nlat_b, nlon_b
Data variables:
    TEMP     (month, z_t, nlat, nlon) float32 dask.array<chunksize=(1, 4, 800, 900), meta=np.ndarray>
    lon_b    (nlat_b, nlon_b) float64 nan nan nan nan nan ... nan nan nan nan
    lat_b    (nlat_b, nlon_b) float64 nan nan nan nan nan ... nan nan nan nan
Attributes:
    cell_methods:            cell_methods = time: mean ==> the variable value...
    time_period_freq:        month_1
    contents:                Diagnostic and Prognostic Variables
    calendar:                All years have exactly  365 days.
    source:                  CCSM POP2, the CCSM Ocean Component
    history:                 none
    model_doi_url:           https://doi.org/10.5065/D67H1H0V
    Conventions:             CF-1.0; http://www.cgd.ucar.edu/cms/eaton/netcdf...
    intake_esm_varname:      ['TEMP']
    revision:                $Id: tavg.F90 89091 2018-04-30 15:58:32Z altunta...
    title:                   g.e20.G.TL319_t13.control.001
    intake_esm_dataset_key:  ocn/pop.h/g.e20.G.TL319_t13.control.001

xarray.Dataset

Dimensions:
- month: 12
- z_t: 62
- nlat: 2400
- nlon: 3600
- nlat_b: 2401
- nlon_b: 3601

Coordinates: (4)

month

(month)

int64

1 2 3 4 5 6 7 8 9 10 11 12

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

z_t

(z_t)

float64

500.0 1.5e+03 ... 5.875e+05

long_name :: depth from surface to midpoint of layer
units :: centimeters
positive :: down
valid_min :: 500.0
valid_max :: 587499.06

array([5.000000e+02, 1.500000e+03, 2.500000e+03, 3.500000e+03, 4.500000e+03,
       5.500000e+03, 6.500000e+03, 7.500000e+03, 8.500000e+03, 9.500000e+03,
       1.050000e+04, 1.150000e+04, 1.250000e+04, 1.350000e+04, 1.450000e+04,
       1.550000e+04, 1.650984e+04, 1.754790e+04, 1.862913e+04, 1.976603e+04,
       2.097114e+04, 2.225783e+04, 2.364088e+04, 2.513702e+04, 2.676542e+04,
       2.854837e+04, 3.051192e+04, 3.268680e+04, 3.510935e+04, 3.782276e+04,
       4.087846e+04, 4.433777e+04, 4.827367e+04, 5.277280e+04, 5.793729e+04,
       6.388626e+04, 7.075633e+04, 7.870025e+04, 8.788252e+04, 9.847059e+04,
       1.106204e+05, 1.244567e+05, 1.400497e+05, 1.573946e+05, 1.764003e+05,
       1.968944e+05, 2.186457e+05, 2.413972e+05, 2.649001e+05, 2.889385e+05,
       3.133405e+05, 3.379793e+05, 3.627670e+05, 3.876452e+05, 4.125768e+05,
       4.375392e+05, 4.625190e+05, 4.875083e+05, 5.125028e+05, 5.375000e+05,
       5.624991e+05, 5.874991e+05])

lon

(nlat, nlon)

float64

dask.array<chunksize=(800, 900), meta=np.ndarray>

long_name :: array of t-grid longitudes
units :: degrees_east

	Array	Chunk
Bytes	65.92 MiB	5.49 MiB
Shape	(2400, 3600)	(800, 900)
Count	13 Tasks	12 Chunks
Type	float64	numpy.ndarray

lat

(nlat, nlon)

float64

dask.array<chunksize=(800, 900), meta=np.ndarray>

long_name :: array of t-grid latitudes
units :: degrees_north

	Array	Chunk
Bytes	65.92 MiB	5.49 MiB
Shape	(2400, 3600)	(800, 900)
Count	13 Tasks	12 Chunks
Type	float64	numpy.ndarray

Data variables: (3)

TEMP

(month, z_t, nlat, nlon)

float32

dask.array<chunksize=(1, 4, 800, 900), meta=np.ndarray>

long_name :: Potential Temperature
units :: degC
grid_loc :: 3111
cell_methods :: time: mean

	Array	Chunk
Bytes	23.95 GiB	10.99 MiB
Shape	(12, 62, 2400, 3600)	(1, 4, 800, 900)
Count	1928184 Tasks	2304 Chunks
Type	float32	numpy.ndarray

lon_b

(nlat_b, nlon_b)

float64

nan nan nan nan ... nan nan nan nan

array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]])

lat_b

(nlat_b, nlon_b)

float64

nan nan nan nan ... nan nan nan nan

array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]])

Attributes: (12)
cell_methods :
cell_methods = time: mean ==> the variable values are averaged over the time interval between the previous time coordinate and the current one. cell_methods absent ==> the variable values are at the time given by the current time coordinate.
time_period_freq :
month_1
contents :
Diagnostic and Prognostic Variables
calendar :
All years have exactly 365 days.
source :
CCSM POP2, the CCSM Ocean Component
history :
none
model_doi_url :
https://doi.org/10.5065/D67H1H0V
Conventions :
CF-1.0; http://www.cgd.ucar.edu/cms/eaton/netcdf/CF-current.htm
intake_esm_varname :
['TEMP']
revision :
$Id: tavg.F90 89091 2018-04-30 15:58:32Z altuntas@ucar.edu $
title :
g.e20.G.TL319_t13.control.001
intake_esm_dataset_key :
ocn/pop.h/g.e20.G.TL319_t13.control.001

Compare the Model to Observations#

Now that we have our regridded dataset, we can compare model and observations! We can take the difference between model and observations (model - obs), subsetting for our first month since we only have the January average from observations

difference = model_monthly_mean.sel(month=1) - regridded_observations

difference['nlat'] = difference.nlat.astype(float)
difference['nlon'] = difference.nlon.astype(float)

Since this can take a while to compute, we can save our output to a Zarr store, which we can read in later if we want to refer back to the output

difference.to_zarr('hires_pop_obs_comparison.zarr')

<xarray.backends.zarr.ZarrStore at 0x2b1c86b5f2e0>

difference = xr.open_zarr('hires_pop_obs_comparison.zarr')

Plot an Interactive Map using hvPlot#

Before we plot our data, since we chunked in nlat/nlon, we will make sure we unify our chunks

difference = difference.unify_chunks()

difference

<xarray.Dataset>
Dimensions:  (z_t: 62, nlat: 2400, nlon: 3600, month: 1)
Coordinates:
    lat      (nlat, nlon) float64 dask.array<chunksize=(300, 450), meta=np.ndarray>
    lon      (nlat, nlon) float64 dask.array<chunksize=(300, 450), meta=np.ndarray>
  * month    (month) int64 1
  * nlat     (nlat) float64 0.0 1.0 2.0 3.0 ... 2.397e+03 2.398e+03 2.399e+03
  * nlon     (nlon) float64 0.0 1.0 2.0 3.0 ... 3.597e+03 3.598e+03 3.599e+03
  * z_t      (z_t) float32 500.0 1.5e+03 2.5e+03 ... 5.625e+05 5.875e+05
Data variables:
    TEMP     (z_t, nlat, nlon, month) float64 dask.array<chunksize=(4, 300, 450, 1), meta=np.ndarray>

xarray.Dataset

Dimensions:
- z_t: 62
- nlat: 2400
- nlon: 3600
- month: 1

Coordinates: (6)

lat

(nlat, nlon)

float64

dask.array<chunksize=(300, 450), meta=np.ndarray>

long_name :: array of t-grid latitudes
units :: degrees_north

	Array	Chunk
Bytes	65.92 MiB	1.03 MiB
Shape	(2400, 3600)	(300, 450)
Count	177 Tasks	80 Chunks
Type	float64	numpy.ndarray

lon

(nlat, nlon)

float64

dask.array<chunksize=(300, 450), meta=np.ndarray>

long_name :: array of t-grid longitudes
units :: degrees_east

	Array	Chunk
Bytes	65.92 MiB	1.03 MiB
Shape	(2400, 3600)	(300, 450)
Count	177 Tasks	80 Chunks
Type	float64	numpy.ndarray

month
(month)
int64
1
```
array([1])
```

nlat

(nlat)

float64

0.0 1.0 2.0 ... 2.398e+03 2.399e+03

array([0.000e+00, 1.000e+00, 2.000e+00, ..., 2.397e+03, 2.398e+03, 2.399e+03])

nlon

(nlon)

float64

0.0 1.0 2.0 ... 3.598e+03 3.599e+03

array([0.000e+00, 1.000e+00, 2.000e+00, ..., 3.597e+03, 3.598e+03, 3.599e+03])

z_t

(z_t)

float32

500.0 1.5e+03 ... 5.875e+05

long_name :: depth from surface to midpoint of layer
positive :: down
units :: centimeters
valid_max :: 587499.0625
valid_min :: 500.0

array([5.000000e+02, 1.500000e+03, 2.500000e+03, 3.500000e+03, 4.500000e+03,
       5.500000e+03, 6.500000e+03, 7.500000e+03, 8.500000e+03, 9.500000e+03,
       1.050000e+04, 1.150000e+04, 1.250000e+04, 1.350000e+04, 1.450000e+04,
       1.550000e+04, 1.650984e+04, 1.754790e+04, 1.862913e+04, 1.976603e+04,
       2.097114e+04, 2.225783e+04, 2.364088e+04, 2.513702e+04, 2.676542e+04,
       2.854837e+04, 3.051192e+04, 3.268680e+04, 3.510935e+04, 3.782276e+04,
       4.087846e+04, 4.433777e+04, 4.827367e+04, 5.277280e+04, 5.793729e+04,
       6.388626e+04, 7.075633e+04, 7.870025e+04, 8.788252e+04, 9.847059e+04,
       1.106204e+05, 1.244567e+05, 1.400497e+05, 1.573946e+05, 1.764003e+05,
       1.968944e+05, 2.186457e+05, 2.413972e+05, 2.649001e+05, 2.889385e+05,
       3.133405e+05, 3.379793e+05, 3.627670e+05, 3.876452e+05, 4.125768e+05,
       4.375392e+05, 4.625190e+05, 4.875083e+05, 5.125028e+05, 5.375000e+05,
       5.624991e+05, 5.874991e+05], dtype=float32)

Data variables: (1)

TEMP

(z_t, nlat, nlon, month)

float64

dask.array<chunksize=(4, 300, 450, 1), meta=np.ndarray>

cell_methods :: time: mean
grid_loc :: 3111
long_name :: Potential Temperature
units :: degC

	Array	Chunk
Bytes	3.99 GiB	4.12 MiB
Shape	(62, 2400, 3600, 1)	(4, 300, 450, 1)
Count	2753 Tasks	1280 Chunks
Type	float64	numpy.ndarray

Attributes: (0)

difference.isel(z_t=0).TEMP.plot()

<matplotlib.collections.QuadMesh at 0x2b1c89523a00>

../../../_images/5d1735a9cd17c21c74dd23dc6b8463e12d0a07af0cbb389b5a01c91e30c7141c.png

Add a Helper Function to Make Sure the Colorbar is around 0#

def create_cmap(levs):
    """
    Creates a colormap using matplotlib.colors
    """
    assert len(levs) % 2 == 0, 'N levels must be even.'
    return colors.LinearSegmentedColormap.from_list(
        name='red_white_blue',
        colors=[(0, 0, 1), (1, 1.0, 1), (1, 0, 0)],
        N=len(levs) - 1,
    )


def generate_cbar(ds, var, lev=0):
    """
    Read in min/max values from a dataset and generate contour levels and a colorbar
    """
    ds = ds.isel(z_t=lev)
    max_diff = abs(np.nanmax(ds[var].values))
    min_diff = abs(np.nanmin(ds[var].values))
    max_diff = np.max(np.array(max_diff, min_diff))
    levels = list(np.linspace(-max_diff, max_diff, 20))
    cmap = create_cmap(levels)
    return levels, cmap

levels, cmap = generate_cbar(difference, 'TEMP')

difference.load()
difference.TEMP.hvplot.quadmesh(x='nlon', y='nlat', rasterize=True).opts(
    width=600, height=400, color_levels=levels, cmap=cmap, bgcolor='lightgray'
)

Conclusion#

In this example, we covered how useful xESMF can be when regridding between two different datasets. It is important to keep in mind the requirements for grid corner information when working with these different datasets.

In the future, this regridding step should be explored further in the context of applyin this at the time of data read in. For example, if one could add this step into the pangeo-forge recipe, the data access and preprocessing could be combined into a single step, enabling easier reproducibility and sharability.

This simplified, extensible workflow could be used to build a catalog of pre-gridded observations in a catalog similar to the example outlined in last week’s post, “Comparing Atmospheric Model Output with Observations Using Intake-ESM”

GeoCAT-Comp Tutorial Benchmarking Performance of History vs. Timeseries Files

10 September 2021

Recent Posts

Archives

Regridding High Resolution Observations to a High Resolution Model Grid#

Imports#

Client

Cluster Info

PBSCluster

Scheduler Info

Scheduler

Workers

Read in Observational Data from the World Ocean Atlas#

The Pangeo-Forge Method#

Read in the Data using Xarray#

Read in Model Data#

Subset for time - here, we are interested in year 24 through 53#

Compute the Monthly Mean using Geocat-Comp#

Regrid and Interpolate Vertical Levels#

Interpolate Vertical Levels#

Regrid using xESMF#

Fix the time on the observations#

Compare the Model to Observations#

Plot an Interactive Map using hvPlot#

Add a Helper Function to Make Sure the Colorbar is around 0#

Conclusion#