Stream: xarray

Topic: unstack failing mysteriously


view this post on Zulip Brian Medeiros (Jun 28 2024 at 17:26):

I'm trying to take a data set with (time, lat, lon) dimensions and convert it to (year, month, lat, lon). I can usually do that by assigning year and month coordinates to the dataset and then unstacking time. But I ran into an issue doing that with a new data set. Trying to diagnose what was happening led me to this small example which I absolutely do not understand. Here I make a 2-year time series and then get the days per month with xarray's accessor method. I make two versions of the "days per month" array, which are identical according to the .identical method. However, with one version I can unstack time but with the other one I can't.

import sys
print(f"python {sys.version}")
import xarray as xr
import numpy as np
import cftime
print(f"numpy: {np.__version__}, xarray: {xr.__version__}, cftime: {cftime.__version__}")
t = np.array([cftime.DatetimeGregorian(1979, 1, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1979, 2, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1979, 3, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1979, 4, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1979, 5, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1979, 6, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1979, 7, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1979, 8, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1979, 9, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1979, 10, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1979, 11, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1979, 12, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1980, 1, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1980, 2, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1980, 3, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1980, 4, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1980, 5, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1980, 6, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1980, 7, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1980, 8, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1980, 9, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1980, 10, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1980, 11, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1980, 12, 1, 0, 0, 0, 0, has_year_zero=False)])
dss = xr.DataArray(t, dims=['time'], coords={"time":t})

# TWO VERSIONS OF "days":
days0 = dss['time'].dt.daysinmonth

days = xr.DataArray(dss['time'].dt.daysinmonth.data, dims=['time'], coords={'time':dss['time']}, attrs=days0.attrs, name='days_in_month')

print(f"IDENTICAL: {days.identical(days0)}")

year = dss['time'].dt.year.data
month = dss['time'].dt.month.data

# REPEAT SAME STEPS FOR days and days0:
days = days.assign_coords(year=("time", year), month=("time", month))
days = days.set_index(time=['year', 'month'])

days0 = days0.assign_coords(year=("time", year), month=("time", month))
days0 = days0.set_index(time=['year', 'month'])

print(f"IDENTICAL: {days.identical(days0)}")

days = days.unstack('time') # THIS WORKS
print(f"{days.dims = }")
#
days0 = days0.unstack('time') # THIS FAILS
print(f"{days0.dims = }")

My output:

python 3.12.0 | packaged by conda-forge | (main, Oct  3 2023, 08:36:57) [Clang 15.0.7 ]
numpy: 1.26.4, xarray: 2024.5.0, cftime: 1.6.3
IDENTICAL: True
IDENTICAL: True
days.dims = ('year', 'month')
...
ValueError: IndexVariable objects must be 1-dimensional

Does anyone have any idea what could be going on here?

view this post on Zulip Katelyn FitzGerald (Jun 28 2024 at 21:28):

Interesting. I'm having trouble tracking this down. I noticed it worked in an older env I had and it seems to stop working with v2023.5.0.

view this post on Zulip Katelyn FitzGerald (Jun 28 2024 at 21:49):

There was some work on the dt accessor released in that version.

I'd consider reporting it as a bug if you haven't already.

view this post on Zulip Brian Medeiros (Jun 28 2024 at 21:51):

Thanks @Katelyn FitzGerald . I haven't reported it yet because I wasn't sure how to even describe it. I guess I'll just provide this example and see where it goes.

view this post on Zulip Katelyn FitzGerald (Jun 28 2024 at 21:56):

I think with the additional info about the version where it worked, your nice example here, and the trace/log, someone will be able to piece together what's going on :fingers_crossed: .

view this post on Zulip Brian Medeiros (Jul 01 2024 at 18:34):

@Katelyn FitzGerald -- I did post this to Xarray's issues: https://github.com/pydata/xarray/issues/9190

It looks like the issue comes down to the difference between IndexVariable and Variable.

view this post on Zulip Katelyn FitzGerald (Jul 02 2024 at 16:27):

So it looks like dss.reset_index("time").dt.daysinmonth addresses the issue with days0 by changing it from an IndexVariable to Variable and the the dt accessor respects that and returns a Variable as well so it doesn't fail later on.

It is interesting that this worked until relatively recently (I checked that it's still an IndexVariable in earlier versions).

view this post on Zulip Katelyn FitzGerald (Jul 02 2024 at 16:30):

Ah, it works in older versions because dt.daysinmonth was previously changing it back to a Variable from the IndexVariable.

Sorry I didn't catch this sooner.

view this post on Zulip Brian Medeiros (Jul 02 2024 at 16:48):

Yeah. To me that was mysterious because I didn't even realize there was a variable attribute or a distinction between Variable and IndexVariable. I was hoping that the identical method would have told me about that, but I guess it doesn't check that.


Last updated: May 16 2025 at 17:14 UTC