Indexing unstructured grids with the Power of Xoak#
This week, there a post within the Zulip regarding how to deal with indexing CAM-SE data. The tricky part here is that is an unstructured grid, where there is only one column to the data, ncol
.
Here is an example of the dataset we are working with. Notice that both lat and lon have the same dimension, ncol
.
<xarray.Dataset>
Dimensions: (nbnd: 2, ncol: 777602, time: 5772)
Coordinates:
* time (time) object 0021-02-01 00:00:00 ... 0502-01-01 00:00:00
Dimensions without coordinates: nbnd, ncol
Data variables:
lat (ncol) float64 ...
area (ncol) float64 ...
lon (ncol) float64 ...
time_bnds (time, nbnd) object 0021-01-01 00:00:00 ... 0502-01-01 00:00:00
PSL (time, ncol) float32 ...
Attributes: (12/13)
np: 4
ne: 120
Conventions: CF-1.0
source: CAM
case: B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02
title: UNSET
... ...
host: psn012
Version: $Name$
revision_Id: $Id$
initial_file: B.E.13.B1850C5.ne120_t12.sehires38.003.cam.i.0021-01-01...
topography_file: /home/export/online1/xwan/cesm/inputdata/atm/cam/topo/U...
NCO: netCDF Operators version 4.7.9 (Homepage = http://nco.s...
Using the Multiple-level indexing documentation within Xarray didn’t seem to help… so what other options are there?
This is a situation where using Xoak comes in handy. Xoak is an xarray extension, or add-on, which makes dealing with these irregular dimensions easier. It utilizes SciPy’s cKDTree adaptor.
There a few great examples on their documentation including:
For this specific issue though, we can use Xoak to deal with these latitudes and longitudes which have the same dimension, ncol
.
Before using xoak for selecting data, there is one modification we need to make to the dataset. In the output from xarray, notice how lat
and lon
are not included in the coordinates. Xoak expects latitude
and longitude
to be coordinates, so we make a slight modification, setting these two variables to be coordinates.
ds = ds.set_coords(['lat', 'lon'])
Now that we have our dataset ready for xoak, we can setup our function and apply it for the lats/lons we are interested in looking at.
import xoak
# Setup a function to deal with flexible indexing
def kdtree_sel(ds, lats, lons):
"""
Uses xoak to select data with irregular dimensions
Parameters
----------
ds : `xarray.Dataset`
Dataset with your data, with lat/lon dims in one-dimension
lats : list
List of latitudes to select
lons : list
List of longitudes to select
Returns
-------
ds : `xr.Dataset`
Dataset subset based on input lats/lons
"""
# Checks to see if this has an xoak index - if not, create one using the extension
if not ds.xoak.index:
ds.xoak.set_index(("lat", "lon"), "scipy_kdtree")
# Return the selected datasets
return ds.xoak.sel(lat=xr.Variable("ncol", lats), lon=xr.Variable("ncol", lons))
kdtree_sel(ds, [34, 34.25, 34.3, 42], [350, 350, 352, 290])
Where the resultant time to compute and resultant dataset are given below. Notice the performance of this indexing method (283 ms
).
CPU times: user 244 ms, sys: 13.8 ms, total: 258 ms
Wall time: 283 ms
<xarray.Dataset>
Dimensions: (nbnd: 2, ncol: 4, time: 5772)
Coordinates:
* time (time) object 0021-02-01 00:00:00 ... 0502-01-01 00:00:00
lat (ncol) float64 34.1 34.3 34.24 41.92
lon (ncol) float64 350.0 350.0 352.0 290.0
Dimensions without coordinates: nbnd, ncol
Data variables:
area (ncol) float64 ...
time_bnds (time, nbnd) object ...
PSL (time, ncol) float32 ...
Attributes: (12/13)
np: 4
ne: 120
Conventions: CF-1.0
source: CAM
case: B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02
title: UNSET
... ...
host: psn012
Version: $Name$
revision_Id: $Id$
initial_file: B.E.13.B1850C5.ne120_t12.sehires38.003.cam.i.0021-01-01...
topography_file: /home/export/online1/xwan/cesm/inputdata/atm/cam/topo/U...
NCO: netCDF Operators version 4.7.9 (Homepage = http://nco.s...