Dataset.to_netcdf · python-questions

I just figured out that one can write a single variable with Xarray to an existing netCDF file and not have to write the whole dataset. This is much much faster! Maybe you gurus knew this already, but I found DataArray.to_netcdf. This works as follows:

ds = open_dataset('file.nc')

... some operation on a variable like time ...

ds['time'].to_netcdf('file.nc',mode='a')

You need the mode='a' to make sure you only replace the existing variable on the file and leave the rest of the data intact.

David Bailey (Jun 30 2021 at 17:08):

timeaxis.py

David Bailey (Jun 30 2021 at 17:09):

Basically I am resetting the time axis to days since 2035-01-01 00:00:00.

Deepak Cherian (Jun 30 2021 at 17:10):

Nice! The append mode isn't described in the docs: https://xarray.pydata.org/en/stable/user-guide/io.html#writing-encoded-data so that would be a nice contribution. Could highlight this trick in there

David Bailey (Jun 30 2021 at 17:11):

It is described under Dataset.to_netcdf.

David Bailey (Jul 19 2021 at 21:07):

More on this. I am finding that sometimes this operation is very slow. When the time axis and time bounds axis are "small" it goes quickly. However, when there are 30 (or 30x2) slices in time (time_bounds), then it is much slower. @Sheri Mickelson mentioned that sometimes for an xarray netcdf write operation, that dask is actually operational behind the scenes. I read some stuff online about changing the chunking and so forth. There was some online discussion here.

https://github.com/pydata/xarray/issues/2912

Has anyone had experience with to_netcdf performance? I am just writing a modified time axis to the file.