Stream: xarray

Topic: controlling Zarr metadata


view this post on Zulip Brian Bonnlander (Feb 23 2021 at 20:00):

I'm using xarray 0.16.2 and zarr 2.6.1.

I am trying to save Zarr stores created from merging together Xarray datasets created from separate NetCDF files, using intake-esm's to_dataset_dict() function. The default behavior of the merge operation seems to be to preserve metadata that is identical across all source datasets. However, I'm trying to control the value of a metadata attribute that has conflicting values across the source datasets, like this:

        del ds['time'].attrs['calendar']
        ds['time'].attrs['calendar'] = 'gregorian'
        ds.to_zarr("/path/to/store", consolidated=True)

I get the following error message:

Failed to write /glade/scratch/bonnland/na-cordex/zarr/tas.eval.day.NAM-22i.raw.zarr: failed to prevent overwriting existing key calendar in attrs. This is probably an encoding field used by xarray to describe how a variable is serialized. To proceed, remove this key from the variable's attributes manually.

view this post on Zulip Brian Bonnlander (Feb 23 2021 at 20:10):

Is there a way to control such metadata parameters that I might be missing? Thanks for any help.

view this post on Zulip Brian Bonnlander (Feb 23 2021 at 20:53):

After some digging into the xarray code, it appears that XArray datasets with datetime64 time values cannot set or change the "units" and "calendar" attributes. These apparently help interpret the time axis values in a critical way. But the values for "units" and "calendar" do not show up in the list of attributes for ds['time'], so I'm not sure exactly what's going on.

view this post on Zulip Michael Levy (Feb 23 2021 at 20:57):

I think both units and calendar are in ds["time"].encoding

view this post on Zulip Brian Bonnlander (Feb 23 2021 at 21:02):

OK, but the error message seems misleading then. Also, I suspect users won't typically look in ds["time"].encoding to understand what kind of calendar they are using. Or maybe I'm not aware of the conventions...

view this post on Zulip Michael Levy (Feb 23 2021 at 21:08):

I think the error message is actually pretty informative: "This is probably an encoding field" Though I didn't realize it was refering to da.encoding until I started playing around on my own :) Also, I'm a little confused about how you mention you want to "control the value of a metadata attribute that has conflicting values across the source datasets". Didn't this come up at an Xdev meeting a while back? What do you expect to happen when you create a single dataset from many different datasets using different time axes?

view this post on Zulip Brian Bonnlander (Feb 23 2021 at 21:10):

When properly combined, the merged dataset has time axis values from the union of all datasets. This is the Gregorian calendar in most cases I have.

view this post on Zulip Brian Bonnlander (Feb 23 2021 at 21:11):

Thanks for your note about the message. I take it that ds['time'].attrs is not the conventional home for "units" and "calendar", which is news to me.

view this post on Zulip Michael Levy (Feb 23 2021 at 21:14):

It must be time-specific, maybe when it recognizes that it is reading a datetime or cftime object? I know units is typically in da.attrs for other fields

view this post on Zulip Brian Bonnlander (Feb 23 2021 at 21:40):

Yes, I can set attributes for any other coordinate or variable other than the time dimension, which I know now I should just leave alone. Thanks for your insights...


Last updated: May 16 2025 at 17:14 UTC