Stream: xarray

Topic: netCDF file size change


view this post on Zulip Danielle Touma (Nov 14 2023 at 21:16):

Hi, I am not sure whats going on. I'm running the same script that I ran about a month ago to analyze and output some large ensemble data using dask and xarray. However, the files that I'm outputting now (using xarray.to_netcdf) are about 10 times the size (1.7GB vs 18GB) than they were when I output them last month. I am 99% sure I did not change my script (other than using interface = 'ext' for the PBSCluster() function), and when I check for chunking, it is the same between last month's output and these new files - they are in fact identical when inspecting them using ncdump -h -s. Were there any changes in the xarray or netCDF libraries in Casper that could be contributing to this? Or does anyone have any suggestions on how to dig in deeper into the files to see whats going on? Thank you! This is the ncdump output:

netcdf CESM2-LE_TREFHTMX_30-day_10-year_100_ens_members_doy001_1980_NWHemi {
dimensions:
        lat = 96 ;
        lon = 144 ;
        sample = 33000 ;
variables:
        double lat(lat) ;
                lat:_FillValue = -900. ;
                lat:long_name = "latitude" ;
                lat:units = "degrees_north" ;
                lat:_Storage = "chunked" ;
                lat:_ChunkSizes = 96 ;
                lat:_Shuffle = "true" ;
                lat:_DeflateLevel = 1 ;
                lat:_Endianness = "little" ;
        double lon(lon) ;
                lon:_FillValue = -900. ;
                lon:long_name = "longitude" ;
                lon:units = "degrees_east" ;
                lon:_Storage = "chunked" ;
                lon:_ChunkSizes = 144 ;
                lon:_Shuffle = "true" ;
                lon:_DeflateLevel = 1 ;
                lon:_Endianness = "little" ;
        float TREFHTMX(sample, lat, lon) ;
                TREFHTMX:_FillValue = -900.f ;
                TREFHTMX:units = "K" ;
                TREFHTMX:long_name = "Maximum reference height temperature over output period" ;
                TREFHTMX:cell_methods = "time: maximum" ;
                TREFHTMX:_Storage = "chunked" ;
                TREFHTMX:_ChunkSizes = 6600, 16, 29 ;
                TREFHTMX:_Shuffle = "true" ;
                TREFHTMX:_DeflateLevel = 1 ;
                TREFHTMX:_Endianness = "little" ;

// global attributes:
                :_NCProperties = "version=2,netcdf=4.8.1,hdf5=1.12.2" ;
                :_SuperblockVersion = 2 ;
                :_IsNetcdf4 = 1 ;
                :_Format = "netCDF-4" ;
}

Last updated: May 16 2025 at 17:14 UTC