Stream: xarray

Topic: Combining NetCDF history files with date var


view this post on Zulip Brian Bonnlander (May 25 2021 at 18:19):

Hi, I could really use some help understanding whether there is an efficient way to combine history files along the time dimension, where the time information is in another variable. I have a whole lot of history files, so efficiency is a concern.

Here is what one of the NetCDF history files looks like when opened with Xarray:

<xarray.Dataset>
Dimensions:  (ilev: 33, lat: 192, lev: 32, lon: 288, slat: 191, slon: 288, time: 1)
Coordinates:
  * lon      (lon) float32 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.2 357.5 358.8
  * slon     (slon) float32 -0.625 0.625 1.875 3.125 ... 354.4 355.6 356.9 358.1
  * lat      (lat) float32 -90.0 -89.06 -88.12 -87.17 ... 87.17 88.12 89.06 90.0
  * slat     (slat) float32 -89.53 -88.59 -87.64 -86.7 ... 87.64 88.59 89.53
  * lev      (lev) float32 3.643 7.595 14.36 24.61 ... 936.2 957.5 976.3 992.6
  * ilev     (ilev) float32 2.255 5.032 10.16 18.56 ... 947.4 967.5 985.1 1e+03
Dimensions without coordinates: time
Data variables: (12/15)
    hyam     (lev) float32 0.003643 0.007595 0.01436 ... 0.006255 0.001989 0.0
    hybm     (lev) float32 0.0 0.0 0.0 0.0 0.0 ... 0.9251 0.9512 0.9743 0.9926
    hyai     (ilev) float32 0.002255 0.005032 0.01016 ... 0.003979 0.0 0.0
    hybi     (ilev) float32 0.0 0.0 0.0 0.0 0.0 ... 0.9389 0.9636 0.9851 1.0
    gw       (lat) float32 3.382e-05 0.0002705 0.000541 ... 0.0002705 3.382e-05
    P0       float32 1e+05
    ...       ...
    Q        (time, lev, lat, lon) float64 ...
    CLDLIQ   (time, lev, lat, lon) float64 ...
    CLDICE   (time, lev, lat, lon) float64 ...
    PS       (time, lat, lon) float64 6.853e+04 6.853e+04 ... 9.963e+04
    date     (time) int32 20110103
    datesec  (time) int32 0
Attributes:
    creation_date:          YYYY MM DD HH MM SS = 2019 07 10 01 31 17
    model:                  CAM

The actual time value for this file is found in a variable called date, a length-1 integer array.

If I try to open a bunch of these files with open_mfdataset(), is there a way to create a time coordinate axis from these files in that step? Or is it better to use the intake-esm catalog mechanism of preprocessing each file in order to combine these files into a Zarr store?

view this post on Zulip Deepak Cherian (May 25 2021 at 18:26):

sepcifyconcat_dim="time"? or concat_dim="date"? What have you tried?

view this post on Zulip Brian Bonnlander (May 25 2021 at 18:28):

I'll try them both; I didn't know what options I had. Thanks!

view this post on Zulip Deepak Cherian (May 25 2021 at 18:30):

OK. If you want to convert date to a proper datetime thing, you'll need to pass a preprocess function that does so.

view this post on Zulip Brian Bonnlander (May 25 2021 at 18:34):

Maybe the preprocess step will fix this, but choosing either concat_dim option results in:

# Create a dataset and drop all but one variable.
with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ds = xr.open_mfdataset(file_list, data_vars='minimal', coords='minimal', compat='override', concat_dim='time')

...
ValueError: Could not find any dimension coordinates to use to order the datasets for concatenation

I would presume that the preprocess function could set the time coordinate value and fix this.

view this post on Zulip Deepak Cherian (May 25 2021 at 18:37):

combine="nested"? if the list of files is in the right order...

view this post on Zulip Brian Bonnlander (May 25 2021 at 18:46):

That worked! Thanks very much for the help!

# Create a dataset and drop all but one variable.
with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ds = xr.open_mfdataset(file_list, data_vars='minimal', coords='minimal', compat='override', concat_dim='time', combine='nested')

print(ds)

<xarray.Dataset>
Dimensions:  (ilev: 33, lat: 192, lev: 32, lon: 288, slat: 191, slon: 288, time: 6)
Coordinates:
  * lon      (lon) float32 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.2 357.5 358.8
  * slon     (slon) float32 -0.625 0.625 1.875 3.125 ... 354.4 355.6 356.9 358.1
  * lat      (lat) float32 -90.0 -89.06 -88.12 -87.17 ... 87.17 88.12 89.06 90.0
  * slat     (slat) float32 -89.53 -88.59 -87.64 -86.7 ... 87.64 88.59 89.53
  * lev      (lev) float32 3.643 7.595 14.36 24.61 ... 936.2 957.5 976.3 992.6
  * ilev     (ilev) float32 2.255 5.032 10.16 18.56 ... 947.4 967.5 985.1 1e+03
Dimensions without coordinates: time
Data variables: (12/15)
    hyam     (lev) float32 dask.array<chunksize=(32,), meta=np.ndarray>
    hybm     (lev) float32 dask.array<chunksize=(32,), meta=np.ndarray>
    hyai     (ilev) float32 dask.array<chunksize=(33,), meta=np.ndarray>
    hybi     (ilev) float32 dask.array<chunksize=(33,), meta=np.ndarray>
    gw       (lat) float32 dask.array<chunksize=(192,), meta=np.ndarray>
    P0       float32 ...
    ...       ...
    Q        (time, lev, lat, lon) float64 dask.array<chunksize=(1, 32, 192, 288), meta=np.ndarray>
    CLDLIQ   (time, lev, lat, lon) float64 dask.array<chunksize=(1, 32, 192, 288), meta=np.ndarray>
    CLDICE   (time, lev, lat, lon) float64 dask.array<chunksize=(1, 32, 192, 288), meta=np.ndarray>
    PS       (time, lat, lon) float64 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
    date     (time) int32 dask.array<chunksize=(1,), meta=np.ndarray>
    datesec  (time) int32 dask.array<chunksize=(1,), meta=np.ndarray>

Last updated: May 16 2025 at 17:14 UTC