Stream: python-questions
Topic: Python version of vinth2p_nodes
Sheri Mickelson (Jun 04 2020 at 15:37):
Does anyone know if there's a Python function similar to the vinth2p_nodes function in NCL? This function interpolates CESM hybrid coordinates to pressure coordinates on an unstructured grid.
Matt Long (Jun 04 2020 at 16:24):
@Sheri Mickelson, I wrote some functions that do this sort of thing...but are not packaged up nicely...nor are they fully tested or as flexible as I would like.
@geocat, I think this is a high-priority capability.
def remap_vertical_coord(ds, da_new_levels, da_coord_field, levdim='lev', method='log', include_coord_field=False): """Interpolate to new vertical coordinate. Parameters ---------- ds : xarray Dataset xarray dataset with data to be remapped da_new_levels : xarray DataArray The levels upon which to remap da_coord_field : xarray DataArray 4d da_coord_field field levdim : str, optional The name of the "level" dimension method : str log or linear include_coord_field : boolean Remap the coordinate field itself Returns ------- dso : xr.Dataset Interpolated dataset """ if method == 'linear': from metpy.interpolate import interpolate_1d interpolate = interpolate_1d elif method == 'log': from metpy.interpolate import log_interpolate_1d interpolate = log_interpolate_1d else: raise ValueError(f'unknown option for interpolation: {method}') if isinstance(da_coord_field, str): da_coord_field = ds[da_coord_field] # determine output dims dims_in = da_coord_field.dims interp_axis = dims_in.index(levdim) dims_out = dims_in[:interp_axis] + da_new_levels.dims + dims_in[interp_axis+1:] len_interp_dim = da_coord_field.shape[interp_axis] assert len_interp_dim == len(da_new_levels), ( f'new_levels must be the same length as the {levdim} ' f'in input data (limitation of application of apply_ufunc)' ) # loop over vars and interpolate dso = xr.Dataset() coords = {c: da_coord_field[c] for c in da_coord_field.coords if c != levdim} coords[da_new_levels.dims[0]] = da_new_levels def interp_func(da_coord_field, data_field): """Define interpolation function.""" return interpolate(da_new_levels.values, da_coord_field, data_field, axis=interp_axis) for name in ds.variables: da = ds[name] if name == da_coord_field.name and not include_coord_field: continue if da.dims != dims_in: dso[name] = da.copy() else: data = xr.apply_ufunc(interp_func, da_coord_field, da, output_dtypes=[da.dtype], dask='parallelized') dso[name] = xr.DataArray(data.data, dims=dims_out, attrs=da.attrs, coords=coords) return dso def pressure_from_hybrid_levels(dsi, layer_center=True): """Calculate pressure at the hybrid levels. Parameters ---------- dsi : xarray Dataset Dataset must contain P0,PS,and hybrid coefficients hya[m,i] hyb[m,i] layer_center : logical, optional compute pressure on cell centers, otherwise compute interface pressure """ ds = dsi.copy() if layer_center: hya = 'hyam' hyb = 'hybm' name = 'Pm' long_name = 'Pressure (layer center)' else: hya = 'hyai' hyb = 'hybi' name = 'Pi' long_name = 'Pressure (interface)' require_variables(ds,['P0', 'PS', hya, hyb]) # compute pressure P = (ds.P0 * ds[hya] + ds.PS * ds[hyb]) * 0.01 # kg m/m^2/s^2 = Pa,*0.01 convert to hPa P.name = name # put the time dimension first if 'time' in P.dims: newdims = ('time', ) + tuple(d for d in P.dims if d != 'time') P = P.transpose(*newdims) # set the attributes P.attrs['long_name'] = long_name P.attrs['units'] = 'hPa' return P def require_variables(ds, req_var): """ensure that variables are present""" missing_var_error = False for v in req_var: if v not in ds: print('ERROR: Missing required variable: %s'%v) missing_var_error = True if missing_var_error: raise ValueError('missing variables')
Sheri Mickelson (Jun 04 2020 at 16:37):
Thanks @Matt Long . I'll take a look at it.
Matt Long (Jun 04 2020 at 16:38):
@Brian Medeiros may have something better in hand.
Brian Bonnlander (Jun 04 2020 at 16:38):
I have charging time to help to some degree. I'm still learning the python libraries, however.
Sheri Mickelson (Jun 08 2020 at 15:01):
@Matt Long I was able to take a closer look at your code and I was hoping you could help with a question I have in regards to these lines
assert len_interp_dim == len(da_new_levels), ( f'new_levels must be the same length as the {levdim} ' f'in input data (limitation of application of apply_ufunc)' )
Is there a requirement that that new levels and the old levels must be the same length?
Matt Long (Jun 08 2020 at 15:06):
yes, that is a limitation of apply_ufunc
(or at least my application of it)
Sheri Mickelson (Jun 08 2020 at 15:09):
Got it, thanks.
Deepak Cherian (Jun 08 2020 at 18:40):
This example I wrote should help extend it to avoid that restriction: https://xarray.pydata.org/en/stable/examples/apply_ufunc_vectorize_1d.html
Brian Medeiros (Jan 13 2021 at 23:52):
Totally missed this thread over the summer. This topic came up at a meeting today, which led me here. I'm currently favoring using xarray's interp
method. Here is what I use:
def pres_from_hybrid(psfc, hya, hyb, p0=100000.): # p = a(k)*p0 + b(k)*ps. return hya*p0 + hyb*psfc def lev_to_plev(data, ps, hyam, hybm, P0=100000., new_levels=None): """Interpolate data from hybrid-sigma levels to isobaric levels.""" pressure = pres_from_hybrid(ps, hyam, hybm, P0) if new_levels is None: pnew = [1000, 925, 850, 700, 500, 400, 300, 250, 200, 150, 100, 70, 50, 30, 20, 10, 7, 5, 3, 2, 1] # mandatory levels else: pnew = new_levels data_interp = data.interp(lev=pnew) return data_interp
It's not too slow, and is about linear in number of levels in my experience. Using my Mac Mini, going from a field of size (time: 361, lev: 32, lat: 192, lon: 288) to 21 standard pressure levels took 46 s, and interpolating to 101 levels took 5min 18s.
Matt Long (Jan 14 2021 at 12:28):
@Brian Medeiros, you compute pressure
from the hybrid coordinate, but I don't see where your function uses it. It looks to me like you are simply interpolating the lev
dimension at the new values in pnew
, but not actually generating a field on constant pressure levels. I don't see how to do this with xarray's interpolation method. pressure
is a fully 3D field, so every i, j
point has a different index vector, i.e.
data_interp[:, j, i] = np.interp(pnew, pressure[:, j, i], data[:, j, i])
What am I missing?
cc @geocat (this remains a high-priority function!)
Brian Medeiros (Jan 14 2021 at 15:32):
@Matt Long , I see what you mean. Maybe this was a typo. I'll take a look this morning. I have (at least) 3 other methods that do the same thing, so I can do a quick comparison.
Brian Medeiros (Jan 14 2021 at 18:01):
Okay, sorry about that. @Matt Long , your comment was completely right. I must have copied an unfinished function. I obviously wasn't paying much attention. I just spent some time exploring options. Here is a version that I think works (very limited testing). See comment about speed below:
from pathlib import Path import os import numpy as np import xarray as xr from numba import njit, prange import matplotlib.pyplot as plt def pres_from_hybrid(psfc, hya, hyb, p0=100000.): # p = a(k)*p0 + b(k)*ps. # this will be in Pa return hya*p0 + hyb*psfc def vert_remap(x_mdl, p_mdl, plev): """Apply simple 1-d interpolation to a field, x given the pressure p and the new pressures plev. x_mdl, p_mdl are numpy arrays of shape (nlevel, spacetime). """ out_shape = (plev.shape[0], x_mdl.shape[1]) print(f'output shape is {out_shape}') output = np.full(out_shape, np.nan) for i in range(out_shape[1]): output[:,i] = np.interp(plev, p_mdl[:,i], x_mdl[:,i]) return output @njit(parallel=True) def vert_remap2(x_mdl, p_mdl, plev): """Apply simple 1-d interpolation to a field, x given the pressure p and the new pressures plev. x_mdl, p_mdl are numpy arrays of shape (nlevel, spacetime). """ out_shape = (plev.shape[0], x_mdl.shape[1]) output = np.full(out_shape, np.nan) for i in prange(out_shape[1]): output[:,i] = np.interp(plev, p_mdl[:,i], x_mdl[:,i]) return output def lev_to_plev(data, ps, hyam, hybm, P0=100000., new_levels=None, parallel=False): """Interpolate data from hybrid-sigma levels to isobaric levels. data : DataArray with a 'lev' coordinate ps : DataArray of surface pressure (Pa), same time/space shape as data hyam, hybm : hybrid coefficients, size of len(lev) P0 : reference pressure new_levels : the output pressure levels (Pa) parallel : if True, use the Numba version to parallelize interpolation step. """ pressure = pres_from_hybrid(ps, hyam, hybm, P0) # Pa if new_levels is None: pnew = 100.0 * np.array([1000, 925, 850, 700, 500, 400, 300, 250, 200, 150, 100, 70, 50, 30, 20, 10, 7, 5, 3, 2, 1]) # mandatory levels, converted to Pa else: pnew = new_levels # reshape data and pressure assuming "lev" is the name of the coordinate zdims = [i for i in data.dims if i != 'lev'] dstack = data.stack(z=zdims) pstack = pressure.stack(z=zdims) if parallel: output = vert_remap2(dstack.values, pstack.values, pnew) else: output = vert_remap(dstack.values, pstack.values, pnew) output = xr.DataArray(output, dims=("plev", "z"), coords={"plev":pnew, "z":pstack['z']}) output = output.unstack() return output
In terms of speed, I tested this on some CAM6 output. The dataset is (time: 361, lev: 32, lat: 192, lon: 288)
. On my Mac Mini, just using a loop with the numpy function, interpolating to the 21 mandatory levels takes 1min 32s. Using the parallelized Numba version dramatically reduces that to 24.1 s (including the compiling time, this is from a single run of the function). Just for fun, I also re-ran the Numba version with 100 levels, and it finished in 56s (I think that does not count compiling time).
I see that @Matt Long used xarray's apply_ufunc
, which I initially tried to use, but was having trouble. I think I might misunderstand how that works. I suspect that Matt's version and my version produce about the same answers. I also experimented with MetPy's interpolation, and it seemed to work well. Matt's version is in much better shape in terms of structure.
Matt Long (Jan 14 2021 at 18:08):
Great! @geocat, would be great to have a pure-python implementation of this function. Do you see a home for this functionality?
John Clyne (Jan 14 2021 at 19:05):
Agreed. @Orhan Eroglu are you seeing this?
Orhan Eroglu (Jan 14 2021 at 19:15):
Great! @geocat, would be great to have a pure-python implementation of this function. Do you see a home for this functionality?
Hi Matt,
Yes this function is in our first priority list. We are going to release the newer version of GeoCAT-comp soon (in 1-2 weeks) including all of our structural changes, and then will be releasing in about monthly periods. vinth2p is planned for the first monthly release, which likely be around early March.
Orhan Eroglu (Jan 15 2021 at 19:32):
@Brian Medeiros your code looks great and I'd like to have a deeper look at it soon. vinth2p
is a high priority function we 'd like to add to the GeoCAT stack asap, so please let me know if you'd be insterested to further contribute to it or colalborate with us anyway.
Brian Medeiros (Jan 15 2021 at 19:48):
@Orhan Eroglu , help yourself to whatever is useable in that code, I hope it helps. In terms of additional value that GeoCAT could offer, I think including more sophisticated options that are efficient would be terrific. In this case, i'm thinking of NCL's vinth2p_ecmwf
function, which also provides a sensible way to extrapolate below the surface level.
I am interested in helping/contributing/collaborating however I can, so don't hesitate to get in touch as you all proceed.
Orhan Eroglu (Feb 23 2021 at 17:07):
All,
We are working on vinth2p
, and the initial tests were great with Brian's codes. We have refactored them to clean up the code to some degree and to provide improved documentation. I think the current code can be a good start point to replicate at least the NCL's original 'vinth2p', but we will need to work further to achieve cases such as vinth2p_nodes
(i.e. unstructured grids) and vinth2p_ecmwf
(i.e. extrapolate values below ground). This coming version of GeoCAT-comp at the end of February will likely only include the vinth2p
case though.
I have a quick question: What about the naming of Python version of this vinth2p
? Existing NCL users might be liking it with its original name, though, it does not sound much Pythonic. Brian already named it lev_to_plev
in his existing code. Any thoughts for the name to be included in geocat.comp namespace? cc: @Brian Medeiros @Matt Long @Sheri Mickelson
Matt Long (Feb 23 2021 at 19:58):
@Orhan Eroglu,
I think it's worth noting that vinth2p
is a special case of vertical remapping, essentially folding conversion from hybrid to pressure levels in. I think it's important to support access to the general remap_coord
operation. As for a name, you might consider something more explicit: interp_hybrid_to_pressure
.
John Clyne (Feb 23 2021 at 20:37):
The NCL names are cryptic, and in most cases only meaningful to experienced NCL users. I'd suggest looking forward and coming up with names that better convey the operation that a function performs (i.e. more "pythonic"). One way to help NCL users transition would be to provide links on the NCL web site from the old function to the closest GetCAT equivalent.
Orhan Eroglu (Feb 23 2021 at 22:51):
Orhan Eroglu,
I think it's worth noting thatvinth2p
is a special case of vertical remapping, essentially folding conversion from hybrid to pressure levels in. I think it's important to support access to the generalremap_coord
operation. As for a name, you might consider something more explicit:interp_hybrid_to_pressure
.
Yes, I agree with this! In our continuous delivery model, starting with vinth2p
will help us (1) address user requests at least for hybrid-to-pressure case and then start getting feedbacks, (2) warm up the team for the general vertical remapping problem, and (3) use vinth2p
cas as a template for other cases and achieve incremental improvements onto it.
The name interp_hybrid_to_pressure
makes sense.
UPDATE: Please see GeoCAT-comp vinth2p branch to see its current shape and let me know if you have any thoughts. (FYI: I was able to use xarray.appy_ufunc and Dask parallelization but I could not make a performance test yet ). A preliminary test case was written to see its outputs visually but a number of unit tests would be added soon.
Deepak Cherian (Feb 23 2021 at 22:52):
xgcm calls it transform: https://xgcm.readthedocs.io/en/latest/transform.html
Orhan Eroglu (Feb 23 2021 at 22:54):
The NCL names are cryptic, and in most cases only meaningful to experienced NCL users. I'd suggest looking forward and coming up with names that better convey the operation that a function performs (i.e. more "pythonic"). One way to help NCL users transition would be to provide links on the NCL web site from the old function to the closest GetCAT equivalent.
Thanks John! I agree with this.
We could even provide the original function name with a DeprecationWarning
in the user API. With this, we can both help NCL users to find the name in their first uses and direct them to get rid of that function name and use the newer one instead.
Thoughts on this?
Orhan Eroglu (Feb 23 2021 at 22:56):
xgcm calls it transform: https://xgcm.readthedocs.io/en/latest/transform.html
Thanks Deepak! I wasn't aware of this transform method from xgcm and will have a look at its details as well.
Deepak Cherian (Feb 23 2021 at 22:59):
Re: the implementation, you need not stack
and provide 2 core dimensions. This interpolation is fundamentally 1D, so there's only one core dimension. See https://xarray.pydata.org/en/stable/examples/apply_ufunc_vectorize_1d.html
it's good to avoid stack since xarray's stack is a reshape which can sometimes not work so well (https://github.com/dask/dask/issues/5544)
Orhan Eroglu (Feb 24 2021 at 01:29):
Re: the implementation, you need not
stack
and provide 2 core dimensions. This interpolation is fundamentally 1D, so there's only one core dimension. See https://xarray.pydata.org/en/stable/examples/apply_ufunc_vectorize_1d.htmlit's good to avoid stack since xarray's stack is a reshape which can sometimes not work so well (https://github.com/dask/dask/issues/5544)
Deepak, thanks a lot for looking into it! It helped me not only correct a flaw but also further cleanup the code and probably speed up the code (haven't tested though) as well as added logarithmic interpolation. Feel free to give it another look and let me know if you have any concerns.
Anderson Banihirwe (Feb 24 2021 at 17:42):
@Orhan Eroglu, do you mind creating a draft PR for the work in vinth2p
branch when you get a moment? It'd make it easy to provide some feedback within GitHub
Orhan Eroglu (Feb 24 2021 at 17:43):
Orhan Eroglu, do you mind creating a draft PR for the work in
vinth2p
branch when you get a moment? It'd make it easy to provide some feedback within GitHub
Sure Anderson, I will do so right after I can make a simple initial unit test running for it (in a few hours today). Will let you know
Orhan Eroglu (Feb 24 2021 at 18:15):
Orhan Eroglu, do you mind creating a draft PR for the work in
vinth2p
branch when you get a moment? It'd make it easy to provide some feedback within GitHub
Done. See it here: https://github.com/NCAR/geocat-comp/pull/111
Orhan Eroglu (Feb 24 2021 at 18:21):
Orhan Eroglu, do you mind creating a draft PR for the work in
vinth2p
branch when you get a moment? It'd make it easy to provide some feedback within GitHubDone. See it here: https://github.com/NCAR/geocat-comp/pull/111
All,
Feel free to give it a review!
Last updated: Jan 30 2022 at 12:01 UTC