Stream: python-questions

Topic: indexing an unstructured grid


view this post on Zulip Stephen Yeager (Apr 22 2021 at 16:44):

I'm working with CAM-SE data that has a 1D spatial dimension ('ncol'). I'd like to be able to use xarray's sel method to extract data at particular lon(ncol), lat(ncol) values. I thought there might be a way to set up a multi-index coordinate to do this, following http://xarray.pydata.org/en/stable/indexing.html#multi-level-indexing. Here is the original dask array: HRDP_psl.png

I've created a MultiIndex object as follows:

import pandas as pd
latlon = pd.MultiIndex.from_arrays([hrdp_psl.lat,hrdp_psl.lon], names=('lat', 'lon'))

I don't know how to assign this MultiIndex object as a coordinate for the above dask array. The following line generates an error: unhashable type: 'MultiIndex'

hrdp_psl.assign_coords({latlon:('ncol',latlon)})

view this post on Zulip Stephen Yeager (Apr 22 2021 at 17:52):

Update: I'm able to assign a multi-index coordinate with the following line

tmp = hrdp_psl.assign_coords(ncol=('ncol',latlon))

Screen-Shot-2021-04-22-at-11.50.04-AM.png

However, this slicing attempt fails with error "method='nearest' not implemented yet for MultiIndex; see GitHub issue 9365"

tmp.sel(ncol=[(34.,350.)],method='nearest')

view this post on Zulip Deepak Cherian (Apr 22 2021 at 17:53):

You are very close. This should work.

hrdp_psl.assign_coords({"ncol": latlon}).rename({"ncol": "latlon"})

This syntax {latlon:('ncol',latlon)} means "create a DataArray with dimension name "ncol" and the MultiIndex in latlon, and assign it with name "latlon". That syntax only works with arrays, not MultiIndex.

My suggested syntax says "assign the multiindex with name 'ncol'"

(PS: the MultiIndex support confused the data model a bit, and prompted an ongoing refactoring. so it's a bit messy to work with.)

EDIT: haha i guess my explanation was wrong if your syntax worked!

view this post on Zulip Anderson Banihirwe (Apr 22 2021 at 17:55):

@Deepak Cherian, I'm curious.... is this something that is well handled by xoak?

view this post on Zulip Deepak Cherian (Apr 22 2021 at 17:55):

@Stephen Yeager is this a regular grid that was flattened?

view this post on Zulip Stephen Yeager (Apr 22 2021 at 17:58):

No, it's unstructured CAM-SE data (spectral element) on the 0.25deg grid (ne120). Here's an example data file sitting on glade:
/glade/work/yeager/iHesp/fredc_data/B1850/B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02.cam.h.PSL.002101.050112.nc

view this post on Zulip Deepak Cherian (Apr 22 2021 at 18:10):

ya this is going to be clunky

@Anderson Banihirwe this s2index thing should work. Do you have time to try it out?

view this post on Zulip Deepak Cherian (Apr 22 2021 at 18:44):

I couldn't get xoak to not crash but the following works after mamba install pys2index

import numpy as np
import xarray as xr
from pys2index import S2PointIndex

xr.set_options(
    display_style="text"
)  # something about this dataset really kills the HTML repr!

ds = xr.open_dataset(
    "/glade/work/yeager/iHesp/fredc_data/B1850/B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02.cam.h.PSL.002101.050112.nc"
).set_coords(["lat", "lon", "area"])


def ugrid_sel(ds, dim, index, lats, lons):
    distances, positions = index.query(
        np.array(np.vstack([lats, lons])).T.astype(np.float64)
    )
    return ds.isel({dim: positions})

The next cell creates the fancy index structure once. It seems to crash unpredictably :confused: but as long as you get it to work everything else seems to work smoothly

latlon = np.vstack([ds["lat"].data, ds["lon"].data]).T  # transpose is really important!

# for some reason can't combine with upper cell :/
s2index = S2PointIndex(latlon)
ugrid_sel(ds, "ncol", s2index, [34, 34.25, 34.3], [350, 350, 352])

gives

<xarray.Dataset>
Dimensions:    (nbnd: 2, ncol: 3, time: 5772)
Coordinates:
  * time       (time) object 0021-02-01 00:00:00 ... 0502-01-01 00:00:00
    lat        (ncol) float64 34.1 34.3 34.24
    area       (ncol) float64 ...
    lon        (ncol) float64 350.0 350.0 352.0
Dimensions without coordinates: nbnd, ncol

view this post on Zulip Deepak Cherian (Apr 22 2021 at 18:45):

@Max Grover this would be a great blogpost!

view this post on Zulip Deepak Cherian (Apr 22 2021 at 18:58):

This is way more dependable:

import xoak


def kdtree_sel(ds, lats, lons):
    if not ds.xoak.index:
        ds.xoak.set_index(("lat", "lon"), "scipy_kdtree")
    return ds.xoak.sel(lat=xr.Variable("ncol", lats), lon=xr.Variable("ncol", lons))


kdtree_sel(ds, [34, 34.25, 34.3, 42], [350, 350, 352, 290])
<xarray.Dataset>
Dimensions:    (nbnd: 2, ncol: 4, time: 5772)
Coordinates:
  * time       (time) object 0021-02-01 00:00:00 ... 0502-01-01 00:00:00
    lat        (ncol) float64 34.1 34.3 34.24 41.92
    area       (ncol) float64 ...
    lon        (ncol) float64 350.0 350.0 352.0 290.0
Dimensions without coordinates: nbnd, ncol

2ms lookups


Last updated: Jan 30 2022 at 12:01 UTC