Hi, I could really use some help understanding whether there is an efficient way to combine history files along the time dimension, where the time information is in another variable. I have a whole lot of history files, so efficiency is a concern.
Here is what one of the NetCDF history files looks like when opened with Xarray:
<xarray.Dataset> Dimensions: (ilev: 33, lat: 192, lev: 32, lon: 288, slat: 191, slon: 288, time: 1) Coordinates: * lon (lon) float32 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.2 357.5 358.8 * slon (slon) float32 -0.625 0.625 1.875 3.125 ... 354.4 355.6 356.9 358.1 * lat (lat) float32 -90.0 -89.06 -88.12 -87.17 ... 87.17 88.12 89.06 90.0 * slat (slat) float32 -89.53 -88.59 -87.64 -86.7 ... 87.64 88.59 89.53 * lev (lev) float32 3.643 7.595 14.36 24.61 ... 936.2 957.5 976.3 992.6 * ilev (ilev) float32 2.255 5.032 10.16 18.56 ... 947.4 967.5 985.1 1e+03 Dimensions without coordinates: time Data variables: (12/15) hyam (lev) float32 0.003643 0.007595 0.01436 ... 0.006255 0.001989 0.0 hybm (lev) float32 0.0 0.0 0.0 0.0 0.0 ... 0.9251 0.9512 0.9743 0.9926 hyai (ilev) float32 0.002255 0.005032 0.01016 ... 0.003979 0.0 0.0 hybi (ilev) float32 0.0 0.0 0.0 0.0 0.0 ... 0.9389 0.9636 0.9851 1.0 gw (lat) float32 3.382e-05 0.0002705 0.000541 ... 0.0002705 3.382e-05 P0 float32 1e+05 ... ... Q (time, lev, lat, lon) float64 ... CLDLIQ (time, lev, lat, lon) float64 ... CLDICE (time, lev, lat, lon) float64 ... PS (time, lat, lon) float64 6.853e+04 6.853e+04 ... 9.963e+04 date (time) int32 20110103 datesec (time) int32 0 Attributes: creation_date: YYYY MM DD HH MM SS = 2019 07 10 01 31 17 model: CAM
The actual time value for this file is found in a variable called date, a length-1 integer array.
If I try to open a bunch of these files with open_mfdataset(), is there a way to create a time coordinate axis from these files in that step? Or is it better to use the intake-esm catalog mechanism of preprocessing each file in order to combine these files into a Zarr store?
sepcifyconcat_dim="time"
? or concat_dim="date"
? What have you tried?
I'll try them both; I didn't know what options I had. Thanks!
OK. If you want to convert date
to a proper datetime thing, you'll need to pass a preprocess
function that does so.
Maybe the preprocess step will fix this, but choosing either concat_dim option results in:
# Create a dataset and drop all but one variable. with dask.config.set(**{'array.slicing.split_large_chunks': False}): ds = xr.open_mfdataset(file_list, data_vars='minimal', coords='minimal', compat='override', concat_dim='time') ... ValueError: Could not find any dimension coordinates to use to order the datasets for concatenation
I would presume that the preprocess
function could set the time coordinate value and fix this.
combine="nested"
? if the list of files is in the right order...
That worked! Thanks very much for the help!
# Create a dataset and drop all but one variable. with dask.config.set(**{'array.slicing.split_large_chunks': False}): ds = xr.open_mfdataset(file_list, data_vars='minimal', coords='minimal', compat='override', concat_dim='time', combine='nested') print(ds) <xarray.Dataset> Dimensions: (ilev: 33, lat: 192, lev: 32, lon: 288, slat: 191, slon: 288, time: 6) Coordinates: * lon (lon) float32 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.2 357.5 358.8 * slon (slon) float32 -0.625 0.625 1.875 3.125 ... 354.4 355.6 356.9 358.1 * lat (lat) float32 -90.0 -89.06 -88.12 -87.17 ... 87.17 88.12 89.06 90.0 * slat (slat) float32 -89.53 -88.59 -87.64 -86.7 ... 87.64 88.59 89.53 * lev (lev) float32 3.643 7.595 14.36 24.61 ... 936.2 957.5 976.3 992.6 * ilev (ilev) float32 2.255 5.032 10.16 18.56 ... 947.4 967.5 985.1 1e+03 Dimensions without coordinates: time Data variables: (12/15) hyam (lev) float32 dask.array<chunksize=(32,), meta=np.ndarray> hybm (lev) float32 dask.array<chunksize=(32,), meta=np.ndarray> hyai (ilev) float32 dask.array<chunksize=(33,), meta=np.ndarray> hybi (ilev) float32 dask.array<chunksize=(33,), meta=np.ndarray> gw (lat) float32 dask.array<chunksize=(192,), meta=np.ndarray> P0 float32 ... ... ... Q (time, lev, lat, lon) float64 dask.array<chunksize=(1, 32, 192, 288), meta=np.ndarray> CLDLIQ (time, lev, lat, lon) float64 dask.array<chunksize=(1, 32, 192, 288), meta=np.ndarray> CLDICE (time, lev, lat, lon) float64 dask.array<chunksize=(1, 32, 192, 288), meta=np.ndarray> PS (time, lat, lon) float64 dask.array<chunksize=(1, 192, 288), meta=np.ndarray> date (time) int32 dask.array<chunksize=(1,), meta=np.ndarray> datesec (time) int32 dask.array<chunksize=(1,), meta=np.ndarray>
Last updated: May 16 2025 at 17:14 UTC