Hi
We have a 250 member ensemble with time series. The time series are created from CESM outputs and compressed (time chunk = 1). This is not efficient for reading the files, and I am wondering if anyone have scripts for compressing files that also can be read in efficiently.
FYI, I am compressing the files with : nccopy -d1 -c time/1,lat/$lath,lon/$lonh $fname1 $fname2
Trude, in the past I used to save the variables I was interested in as a native numpy format (.npy). I didn't use dask at all but maybe this suits your needs because it loads quickly and compressed the data size.
To save the array compressed:
np.savez_compressed(fileout, ifile=filename, var=varname, data=mydata)
To load it:
mydata = np.load(filein)['data']
Let me know if that works for you and I can give you my scripts.
Trude Eidhammer said:
Hi
We have a 250 member ensemble with time series. The time series are created from CESM outputs and compressed (time chunk = 1). This is not efficient for reading the files, and I am wondering if anyone have scripts for compressing files that also can be read in efficiently.
Hi Trude, I have some recipes for "rechunking" using the rechunker package and appending to zarr files. The input can be netcdf but the output of my recipes is zarr. There is a postprocessing step that can convert to netcdf again. I'm happy to discuss off line but wanted to post here in case such recipies are of interest to others.
James
Last updated: May 16 2025 at 17:14 UTC