I just figured out that one can write a single variable with Xarray to an existing netCDF file and not have to write the whole dataset. This is much much faster! Maybe you gurus knew this already, but I found DataArray.to_netcdf. This works as follows:
ds = open_dataset('file.nc')
... some operation on a variable like time ...
ds['time'].to_netcdf('file.nc',mode='a')
You need the mode='a' to make sure you only replace the existing variable on the file and leave the rest of the data intact.
Basically I am resetting the time axis to days since 2035-01-01 00:00:00.
Nice! The append
mode isn't described in the docs: https://xarray.pydata.org/en/stable/user-guide/io.html#writing-encoded-data so that would be a nice contribution. Could highlight this trick in there
It is described under Dataset.to_netcdf.
More on this. I am finding that sometimes this operation is very slow. When the time axis and time bounds axis are "small" it goes quickly. However, when there are 30 (or 30x2) slices in time (time_bounds), then it is much slower. @Sheri Mickelson mentioned that sometimes for an xarray netcdf write operation, that dask is actually operational behind the scenes. I read some stuff online about changing the chunking and so forth. There was some online discussion here.
https://github.com/pydata/xarray/issues/2912
Has anyone had experience with to_netcdf performance? I am just writing a modified time axis to the file.
This discussion suggests trying .load().to_netcdf(...)
.
https://github.com/pydata/xarray/issues/2912
Hmm.. if you're only writing time is ds[["time"]].to_netcdf(..., mode="a")
faster?
If time points are added or deleted, that will take some computation.
Interesting. What does the extra set of square brackets do? This is much faster!
Last updated: May 16 2025 at 17:14 UTC