Stream: python-questions

Topic: preserving xarray metadata


view this post on Zulip Brian Bonnlander (Feb 05 2020 at 23:08):

Hi, I'm throwing out this question in case it's helpful to someone else. I have an xarray dataset with a variable that has coordinates (time, lat, lon). The variable also has attributes associated with it, i.e. ds.<var>.attrs returns a non-empty dictionary. When I multiply the variable by some weights and assign to a new variable (ds2 = ds.<var> * ds_wts, it seems the attrs are dropped, but I need to keep them. Is this expected behavior? Is there some way to preserve metadata over computations?

view this post on Zulip Brian Bonnlander (Feb 05 2020 at 23:11):

Maybe the problem is that there are two xarray datasets, so it's not automatic to decide which dataset's attributes get kept. Perhaps I need to assign to the dataset variable directly, instead of assigning to a new dataset.

view this post on Zulip Brian Bonnlander (Feb 06 2020 at 00:13):

So I tried the following, which did not work:

ds2 = ds
ds2['var']  = ds['var'] * ds_wts

I am taking advantage of the named coordinates for the weights, with coordinate 'lat', to be applied to ds2['var'], which has coordinates (time, lat, lon).
I think it will work to copy over the attrs explicitly, I just thought there was a more elegant or automatic way for metadata to be preserved:

ds2['var'].attrs = ds['var'].attrs

view this post on Zulip Brian Bonnlander (Feb 06 2020 at 01:21):

...And I believe this page answers my question: xarray does not preserve metadata for many of its operations: http://xarray.pydata.org/en/stable/faq.html#what-is-your-approach-to-metadata

view this post on Zulip Matt Long (Feb 06 2020 at 02:51):

In some instances, you can do something like

ds.var.values = ds.var * ds_wts

and the metadata will be preserved. You would not want to do this with dask arrays.

view this post on Zulip Deepak Cherian (Feb 06 2020 at 16:54):

unfortunately this pattern of assigning to .values is the cause of many recent bugs in esmlab. It should absolutely not be done for "dimension coordinates" but that distinction is hard to remember.

Instead consider using ds["var"] = ds.var.copy(data=ds.var * ds_wts).

Another possibility is:

with xr.set_options(keep_attrs=True):
    ds["var"] = ds.var * ds_wts

but I don't think that flag has been implemented on binary operations yet.

view this post on Zulip Brian Bonnlander (Feb 06 2020 at 17:14):

Thanks for your ideas. I will give them a try. I thought for a while about suggesting that xarray should give precedence to the metadata from the "left" operand. For example:

ds["var"] = ds["var"] * ds_wts

...would keep the metadata from ds["var"] unchanged. But I suppose it is not clear whether the metadata from ds_wts should be added in if it does not conflict, or if it should be left out by default. It seems there is no automatic, intuitive way to handle metadata with math operations, and the best approach is probably to implement the flag "keep_attrs" to let the user decide.


Last updated: Jan 30 2022 at 12:01 UTC