Stream: python-questions
Topic: indexing an unstructured grid
Stephen Yeager (Apr 22 2021 at 16:44):
I'm working with CAM-SE data that has a 1D spatial dimension ('ncol'). I'd like to be able to use xarray's sel
method to extract data at particular lon(ncol), lat(ncol) values. I thought there might be a way to set up a multi-index coordinate to do this, following http://xarray.pydata.org/en/stable/indexing.html#multi-level-indexing. Here is the original dask array: HRDP_psl.png
I've created a MultiIndex object as follows:
import pandas as pd latlon = pd.MultiIndex.from_arrays([hrdp_psl.lat,hrdp_psl.lon], names=('lat', 'lon'))
I don't know how to assign this MultiIndex object as a coordinate for the above dask array. The following line generates an error: unhashable type: 'MultiIndex'
hrdp_psl.assign_coords({latlon:('ncol',latlon)})
Stephen Yeager (Apr 22 2021 at 17:52):
Update: I'm able to assign a multi-index coordinate with the following line
tmp = hrdp_psl.assign_coords(ncol=('ncol',latlon))
Screen-Shot-2021-04-22-at-11.50.04-AM.png
However, this slicing attempt fails with error "method='nearest' not implemented yet for MultiIndex; see GitHub issue 9365"
tmp.sel(ncol=[(34.,350.)],method='nearest')
Deepak Cherian (Apr 22 2021 at 17:53):
You are very close. This should work.
hrdp_psl.assign_coords({"ncol": latlon}).rename({"ncol": "latlon"})
This syntax {latlon:('ncol',latlon)}
means "create a DataArray with dimension name "ncol" and the MultiIndex in latlon
, and assign it with name "latlon". That syntax only works with arrays, not MultiIndex.
My suggested syntax says "assign the multiindex with name 'ncol'"
(PS: the MultiIndex support confused the data model a bit, and prompted an ongoing refactoring. so it's a bit messy to work with.)
EDIT: haha i guess my explanation was wrong if your syntax worked!
Anderson Banihirwe (Apr 22 2021 at 17:55):
@Deepak Cherian, I'm curious.... is this something that is well handled by xoak?
Deepak Cherian (Apr 22 2021 at 17:55):
@Stephen Yeager is this a regular grid that was flattened?
Stephen Yeager (Apr 22 2021 at 17:58):
No, it's unstructured CAM-SE data (spectral element) on the 0.25deg grid (ne120). Here's an example data file sitting on glade:
/glade/work/yeager/iHesp/fredc_data/B1850/B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02.cam.h.PSL.002101.050112.nc
Deepak Cherian (Apr 22 2021 at 18:10):
ya this is going to be clunky
@Anderson Banihirwe this s2index thing should work. Do you have time to try it out?
Deepak Cherian (Apr 22 2021 at 18:44):
I couldn't get xoak to not crash but the following works after mamba install pys2index
import numpy as np import xarray as xr from pys2index import S2PointIndex xr.set_options( display_style="text" ) # something about this dataset really kills the HTML repr! ds = xr.open_dataset( "/glade/work/yeager/iHesp/fredc_data/B1850/B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02.cam.h.PSL.002101.050112.nc" ).set_coords(["lat", "lon", "area"]) def ugrid_sel(ds, dim, index, lats, lons): distances, positions = index.query( np.array(np.vstack([lats, lons])).T.astype(np.float64) ) return ds.isel({dim: positions})
The next cell creates the fancy index structure once. It seems to crash unpredictably :confused: but as long as you get it to work everything else seems to work smoothly
latlon = np.vstack([ds["lat"].data, ds["lon"].data]).T # transpose is really important! # for some reason can't combine with upper cell :/ s2index = S2PointIndex(latlon)
ugrid_sel(ds, "ncol", s2index, [34, 34.25, 34.3], [350, 350, 352])
gives
<xarray.Dataset> Dimensions: (nbnd: 2, ncol: 3, time: 5772) Coordinates: * time (time) object 0021-02-01 00:00:00 ... 0502-01-01 00:00:00 lat (ncol) float64 34.1 34.3 34.24 area (ncol) float64 ... lon (ncol) float64 350.0 350.0 352.0 Dimensions without coordinates: nbnd, ncol
Deepak Cherian (Apr 22 2021 at 18:45):
@Max Grover this would be a great blogpost!
Deepak Cherian (Apr 22 2021 at 18:58):
This is way more dependable:
import xoak def kdtree_sel(ds, lats, lons): if not ds.xoak.index: ds.xoak.set_index(("lat", "lon"), "scipy_kdtree") return ds.xoak.sel(lat=xr.Variable("ncol", lats), lon=xr.Variable("ncol", lons)) kdtree_sel(ds, [34, 34.25, 34.3, 42], [350, 350, 352, 290])
<xarray.Dataset> Dimensions: (nbnd: 2, ncol: 4, time: 5772) Coordinates: * time (time) object 0021-02-01 00:00:00 ... 0502-01-01 00:00:00 lat (ncol) float64 34.1 34.3 34.24 41.92 area (ncol) float64 ... lon (ncol) float64 350.0 350.0 352.0 290.0 Dimensions without coordinates: nbnd, ncol
2ms lookups
Last updated: Jan 30 2022 at 12:01 UTC