Stream: python-questions

Topic: Dataset.to_netcdf


view this post on Zulip David Bailey (Jun 30 2021 at 17:04):

I just figured out that one can write a single variable with Xarray to an existing netCDF file and not have to write the whole dataset. This is much much faster! Maybe you gurus knew this already, but I found DataArray.to_netcdf. This works as follows:

ds = open_dataset('file.nc')

... some operation on a variable like time ...

ds['time'].to_netcdf('file.nc',mode='a')

You need the mode='a' to make sure you only replace the existing variable on the file and leave the rest of the data intact.

view this post on Zulip David Bailey (Jun 30 2021 at 17:08):

timeaxis.py

view this post on Zulip David Bailey (Jun 30 2021 at 17:09):

Basically I am resetting the time axis to days since 2035-01-01 00:00:00.

view this post on Zulip Deepak Cherian (Jun 30 2021 at 17:10):

Nice! The append mode isn't described in the docs: https://xarray.pydata.org/en/stable/user-guide/io.html#writing-encoded-data so that would be a nice contribution. Could highlight this trick in there

view this post on Zulip David Bailey (Jun 30 2021 at 17:11):

It is described under Dataset.to_netcdf.

view this post on Zulip David Bailey (Jul 19 2021 at 21:07):

More on this. I am finding that sometimes this operation is very slow. When the time axis and time bounds axis are "small" it goes quickly. However, when there are 30 (or 30x2) slices in time (time_bounds), then it is much slower. @Sheri Mickelson mentioned that sometimes for an xarray netcdf write operation, that dask is actually operational behind the scenes. I read some stuff online about changing the chunking and so forth. There was some online discussion here.

https://github.com/pydata/xarray/issues/2912

Has anyone had experience with to_netcdf performance? I am just writing a modified time axis to the file.

view this post on Zulip Brian Bonnlander (Jul 19 2021 at 21:17):

This discussion suggests trying .load().to_netcdf(...).

https://github.com/pydata/xarray/issues/2912

view this post on Zulip Deepak Cherian (Jul 19 2021 at 21:18):

Hmm.. if you're only writing time is ds[["time"]].to_netcdf(..., mode="a") faster?

view this post on Zulip Brian Bonnlander (Jul 19 2021 at 21:20):

If time points are added or deleted, that will take some computation.

view this post on Zulip David Bailey (Aug 02 2021 at 14:16):

Interesting. What does the extra set of square brackets do? This is much faster!


Last updated: Jan 30 2022 at 12:01 UTC