Python version of vinth2p_nodes · python-questions

Does anyone know if there's a Python function similar to the vinth2p_nodes function in NCL? This function interpolates CESM hybrid coordinates to pressure coordinates on an unstructured grid.

Matt Long (Jun 04 2020 at 16:24):

@Sheri Mickelson, I wrote some functions that do this sort of thing...but are not packaged up nicely...nor are they fully tested or as flexible as I would like.

def remap_vertical_coord(ds, da_new_levels, da_coord_field, levdim='lev', method='log',
                         include_coord_field=False):
    """Interpolate to new vertical coordinate.

    Parameters
    ----------
    ds :  xarray Dataset
        xarray dataset with data to be remapped
    da_new_levels : xarray DataArray
        The levels upon which to remap
    da_coord_field : xarray DataArray
        4d da_coord_field field
    levdim : str, optional
        The name of the "level" dimension
    method : str
        log or linear
    include_coord_field : boolean
        Remap the coordinate field itself

    Returns
    -------
    dso : xr.Dataset
        Interpolated dataset
    """

    if method == 'linear':
        from metpy.interpolate import interpolate_1d
        interpolate = interpolate_1d
    elif method == 'log':
        from metpy.interpolate import log_interpolate_1d
        interpolate = log_interpolate_1d
    else:
        raise ValueError(f'unknown option for interpolation: {method}')

    if isinstance(da_coord_field, str):
        da_coord_field = ds[da_coord_field]

    # determine output dims
    dims_in = da_coord_field.dims
    interp_axis = dims_in.index(levdim)
    dims_out = dims_in[:interp_axis] + da_new_levels.dims + dims_in[interp_axis+1:]

    len_interp_dim = da_coord_field.shape[interp_axis]
    assert len_interp_dim == len(da_new_levels), (
        f'new_levels must be the same length as the {levdim} '
        f'in input data (limitation of application of apply_ufunc)'
    )

    # loop over vars and interpolate
    dso = xr.Dataset()
    coords = {c: da_coord_field[c] for c in da_coord_field.coords if c != levdim}
    coords[da_new_levels.dims[0]] = da_new_levels

    def interp_func(da_coord_field, data_field):
        """Define interpolation function."""
        return interpolate(da_new_levels.values, da_coord_field, data_field,
                           axis=interp_axis)

    for name in ds.variables:
        da = ds[name]
        if name == da_coord_field.name and not include_coord_field:
            continue

        if da.dims != dims_in:
            dso[name] = da.copy()
        else:
            data = xr.apply_ufunc(interp_func, da_coord_field, da,
                                  output_dtypes=[da.dtype],
                                  dask='parallelized')
            dso[name] = xr.DataArray(data.data, dims=dims_out,
                                     attrs=da.attrs, coords=coords)

    return dso


def pressure_from_hybrid_levels(dsi, layer_center=True):
    """Calculate pressure at the hybrid levels.

    Parameters
    ----------

    dsi : xarray Dataset
       Dataset must contain P0,PS,and hybrid coefficients hya[m,i] hyb[m,i]
    layer_center : logical, optional
       compute pressure on cell centers, otherwise compute interface pressure
    """

    ds = dsi.copy()

    if layer_center:
        hya = 'hyam'
        hyb = 'hybm'
        name = 'Pm'
        long_name = 'Pressure (layer center)'
    else:
        hya = 'hyai'
        hyb = 'hybi'
        name = 'Pi'
        long_name = 'Pressure (interface)'

    require_variables(ds,['P0', 'PS', hya, hyb])

    # compute pressure
    P = (ds.P0 * ds[hya] + ds.PS * ds[hyb]) * 0.01 # kg m/m^2/s^2 = Pa,*0.01 convert to hPa
    P.name = name

    # put the time dimension first
    if 'time' in P.dims:
        newdims = ('time', ) + tuple(d for d in P.dims if d != 'time')
        P = P.transpose(*newdims)

    # set the attributes
    P.attrs['long_name'] = long_name
    P.attrs['units'] = 'hPa'

    return P


def require_variables(ds, req_var):
   """ensure that variables are present"""
    missing_var_error = False
    for v in req_var:
        if v not in ds:
            print('ERROR: Missing required variable: %s'%v)
            missing_var_error = True

    if missing_var_error:
        raise ValueError('missing variables')

Sheri Mickelson (Jun 04 2020 at 16:37):

Matt Long (Jun 04 2020 at 16:38):

Brian Bonnlander (Jun 04 2020 at 16:38):

I have charging time to help to some degree. I'm still learning the python libraries, however.

Sheri Mickelson (Jun 08 2020 at 15:01):

@Matt Long I was able to take a closer look at your code and I was hoping you could help with a question I have in regards to these lines

    assert len_interp_dim == len(da_new_levels), (
        f'new_levels must be the same length as the {levdim} '
        f'in input data (limitation of application of apply_ufunc)'
    )

Is there a requirement that that new levels and the old levels must be the same length?

Matt Long (Jun 08 2020 at 15:06):

Sheri Mickelson (Jun 08 2020 at 15:09):

Deepak Cherian (Jun 08 2020 at 18:40):

Brian Medeiros (Jan 13 2021 at 23:52):

Totally missed this thread over the summer. This topic came up at a meeting today, which led me here. I'm currently favoring using xarray's interp method. Here is what I use:

def pres_from_hybrid(psfc, hya, hyb, p0=100000.):
    # p = a(k)*p0 + b(k)*ps.
    return hya*p0 + hyb*psfc

def lev_to_plev(data, ps, hyam, hybm, P0=100000., new_levels=None):
    """Interpolate data from hybrid-sigma levels to isobaric levels."""
    pressure = pres_from_hybrid(ps, hyam, hybm, P0)
    if new_levels is None:
        pnew = [1000, 925, 850, 700, 500, 400, 300, 250, 200, 150, 100, 70, 50, 30, 20, 10, 7, 5, 3, 2, 1]  # mandatory levels
    else:
        pnew = new_levels
    data_interp = data.interp(lev=pnew)
    return data_interp

It's not too slow, and is about linear in number of levels in my experience. Using my Mac Mini, going from a field of size (time: 361, lev: 32, lat: 192, lon: 288) to 21 standard pressure levels took 46 s, and interpolating to 101 levels took 5min 18s.

Matt Long (Jan 14 2021 at 12:28):

@Brian Medeiros, you compute pressure from the hybrid coordinate, but I don't see where your function uses it. It looks to me like you are simply interpolating the lev dimension at the new values in pnew, but not actually generating a field on constant pressure levels. I don't see how to do this with xarray's interpolation method. pressure is a fully 3D field, so every i, j point has a different index vector, i.e.

data_interp[:, j, i] = np.interp(pnew, pressure[:, j, i], data[:, j, i])

Brian Medeiros (Jan 14 2021 at 15:32):

@Matt Long , I see what you mean. Maybe this was a typo. I'll take a look this morning. I have (at least) 3 other methods that do the same thing, so I can do a quick comparison.

Brian Medeiros (Jan 14 2021 at 18:01):

Okay, sorry about that. @Matt Long , your comment was completely right. I must have copied an unfinished function. I obviously wasn't paying much attention. I just spent some time exploring options. Here is a version that I think works (very limited testing). See comment about speed below:

from pathlib import Path
import os
import numpy as np
import xarray as xr
from numba import njit, prange
import matplotlib.pyplot as plt


def pres_from_hybrid(psfc, hya, hyb, p0=100000.):
    # p = a(k)*p0 + b(k)*ps.
    # this will be in Pa
    return hya*p0 + hyb*psfc


def vert_remap(x_mdl, p_mdl, plev):
    """Apply simple 1-d interpolation to a field, x
       given the pressure p and the new pressures plev.
       x_mdl, p_mdl are numpy arrays of shape (nlevel, spacetime).
    """
    out_shape = (plev.shape[0], x_mdl.shape[1])
    print(f'output shape is {out_shape}')
    output = np.full(out_shape, np.nan)
    for i in range(out_shape[1]):
        output[:,i] = np.interp(plev, p_mdl[:,i], x_mdl[:,i])
    return output


@njit(parallel=True)
def vert_remap2(x_mdl, p_mdl, plev):
    """Apply simple 1-d interpolation to a field, x
       given the pressure p and the new pressures plev.
       x_mdl, p_mdl are numpy arrays of shape (nlevel, spacetime).
    """
    out_shape = (plev.shape[0], x_mdl.shape[1])
    output = np.full(out_shape, np.nan)
    for i in prange(out_shape[1]):
        output[:,i] = np.interp(plev, p_mdl[:,i], x_mdl[:,i])
    return output


def lev_to_plev(data, ps, hyam, hybm, P0=100000., new_levels=None, parallel=False):
    """Interpolate data from hybrid-sigma levels to isobaric levels.

    data : DataArray with a 'lev' coordinate
    ps   : DataArray of surface pressure (Pa), same time/space shape as data
    hyam, hybm : hybrid coefficients, size of len(lev)
    P0 : reference pressure
    new_levels : the output pressure levels (Pa)
    parallel : if True, use the Numba version to parallelize interpolation step.
    """
    pressure = pres_from_hybrid(ps, hyam, hybm, P0)  # Pa
    if new_levels is None:
        pnew = 100.0 * np.array([1000, 925, 850, 700, 500, 400, 300, 250, 200, 150, 100, 70, 50, 30, 20, 10, 7, 5, 3, 2, 1])  # mandatory levels, converted to Pa
    else:
        pnew = new_levels
    # reshape data and pressure assuming "lev" is the name of the coordinate
    zdims = [i for i in data.dims if i != 'lev']
    dstack = data.stack(z=zdims)
    pstack = pressure.stack(z=zdims)
    if parallel:
        output = vert_remap2(dstack.values, pstack.values, pnew)
    else:
        output = vert_remap(dstack.values, pstack.values, pnew)
    output = xr.DataArray(output, dims=("plev", "z"), coords={"plev":pnew, "z":pstack['z']})
    output = output.unstack()
    return output

In terms of speed, I tested this on some CAM6 output. The dataset is (time: 361, lev: 32, lat: 192, lon: 288). On my Mac Mini, just using a loop with the numpy function, interpolating to the 21 mandatory levels takes 1min 32s. Using the parallelized Numba version dramatically reduces that to 24.1 s (including the compiling time, this is from a single run of the function). Just for fun, I also re-ran the Numba version with 100 levels, and it finished in 56s (I think that does not count compiling time).

I see that @Matt Long used xarray's apply_ufunc, which I initially tried to use, but was having trouble. I think I might misunderstand how that works. I suspect that Matt's version and my version produce about the same answers. I also experimented with MetPy's interpolation, and it seemed to work well. Matt's version is in much better shape in terms of structure.

Matt Long (Jan 14 2021 at 18:08):

Great! @geocat, would be great to have a pure-python implementation of this function. Do you see a home for this functionality?

John Clyne (Jan 14 2021 at 19:05):

Orhan Eroglu (Jan 14 2021 at 19:15):

Yes this function is in our first priority list. We are going to release the newer version of GeoCAT-comp soon (in 1-2 weeks) including all of our structural changes, and then will be releasing in about monthly periods. vinth2p is planned for the first monthly release, which likely be around early March.

Orhan Eroglu (Jan 15 2021 at 19:32):

@Brian Medeiros your code looks great and I'd like to have a deeper look at it soon. vinth2p is a high priority function we 'd like to add to the GeoCAT stack asap, so please let me know if you'd be insterested to further contribute to it or colalborate with us anyway.

Brian Medeiros (Jan 15 2021 at 19:48):

@Orhan Eroglu , help yourself to whatever is useable in that code, I hope it helps. In terms of additional value that GeoCAT could offer, I think including more sophisticated options that are efficient would be terrific. In this case, i'm thinking of NCL's vinth2p_ecmwf function, which also provides a sensible way to extrapolate below the surface level.
I am interested in helping/contributing/collaborating however I can, so don't hesitate to get in touch as you all proceed.

Orhan Eroglu (Feb 23 2021 at 17:07):

We are working on vinth2p, and the initial tests were great with Brian's codes. We have refactored them to clean up the code to some degree and to provide improved documentation. I think the current code can be a good start point to replicate at least the NCL's original 'vinth2p', but we will need to work further to achieve cases such as vinth2p_nodes (i.e. unstructured grids) and vinth2p_ecmwf (i.e. extrapolate values below ground). This coming version of GeoCAT-comp at the end of February will likely only include the vinth2p case though.

I have a quick question: What about the naming of Python version of this vinth2p? Existing NCL users might be liking it with its original name, though, it does not sound much Pythonic. Brian already named it lev_to_plev in his existing code. Any thoughts for the name to be included in geocat.comp namespace? cc: @Brian Medeiros @Matt Long @Sheri Mickelson

Matt Long (Feb 23 2021 at 19:58):

@Orhan Eroglu,
I think it's worth noting that vinth2p is a special case of vertical remapping, essentially folding conversion from hybrid to pressure levels in. I think it's important to support access to the general remap_coord operation. As for a name, you might consider something more explicit: interp_hybrid_to_pressure.

John Clyne (Feb 23 2021 at 20:37):

The NCL names are cryptic, and in most cases only meaningful to experienced NCL users. I'd suggest looking forward and coming up with names that better convey the operation that a function performs (i.e. more "pythonic"). One way to help NCL users transition would be to provide links on the NCL web site from the old function to the closest GetCAT equivalent.

Orhan Eroglu (Feb 23 2021 at 22:51):

Yes, I agree with this! In our continuous delivery model, starting with vinth2p will help us (1) address user requests at least for hybrid-to-pressure case and then start getting feedbacks, (2) warm up the team for the general vertical remapping problem, and (3) use vinth2p cas as a template for other cases and achieve incremental improvements onto it.

UPDATE: Please see GeoCAT-comp vinth2p branch to see its current shape and let me know if you have any thoughts. (FYI: I was able to use xarray.appy_ufunc and Dask parallelization but I could not make a performance test yet ). A preliminary test case was written to see its outputs visually but a number of unit tests would be added soon.

Deepak Cherian (Feb 23 2021 at 22:52):

Orhan Eroglu (Feb 23 2021 at 22:54):

We could even provide the original function name with a DeprecationWarning in the user API. With this, we can both help NCL users to find the name in their first uses and direct them to get rid of that function name and use the newer one instead.

Orhan Eroglu (Feb 23 2021 at 22:56):

Thanks Deepak! I wasn't aware of this transform method from xgcm and will have a look at its details as well.

Deepak Cherian (Feb 23 2021 at 22:59):

Orhan Eroglu (Feb 24 2021 at 01:29):

Deepak, thanks a lot for looking into it! It helped me not only correct a flaw but also further cleanup the code and probably speed up the code (haven't tested though) as well as added logarithmic interpolation. Feel free to give it another look and let me know if you have any concerns.

Anderson Banihirwe (Feb 24 2021 at 17:42):

@Orhan Eroglu, do you mind creating a draft PR for the work in vinth2p branch when you get a moment? It'd make it easy to provide some feedback within GitHub

Orhan Eroglu (Feb 24 2021 at 17:43):

Sure Anderson, I will do so right after I can make a simple initial unit test running for it (in a few hours today). Will let you know

Stream: python-questions

Topic: Python version of vinth2p_nodes

Sheri Mickelson (Jun 04 2020 at 15:37):