I'm trying to take a data set with (time, lat, lon) dimensions and convert it to (year, month, lat, lon). I can usually do that by assigning year
and month
coordinates to the dataset and then unstacking time
. But I ran into an issue doing that with a new data set. Trying to diagnose what was happening led me to this small example which I absolutely do not understand. Here I make a 2-year time series and then get the days per month with xarray's accessor method. I make two versions of the "days per month" array, which are identical according to the .identical
method. However, with one version I can unstack time but with the other one I can't.
import sys
print(f"python {sys.version}")
import xarray as xr
import numpy as np
import cftime
print(f"numpy: {np.__version__}, xarray: {xr.__version__}, cftime: {cftime.__version__}")
t = np.array([cftime.DatetimeGregorian(1979, 1, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1979, 2, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1979, 3, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1979, 4, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1979, 5, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1979, 6, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1979, 7, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1979, 8, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1979, 9, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1979, 10, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1979, 11, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1979, 12, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1980, 1, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1980, 2, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1980, 3, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1980, 4, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1980, 5, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1980, 6, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1980, 7, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1980, 8, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1980, 9, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1980, 10, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1980, 11, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1980, 12, 1, 0, 0, 0, 0, has_year_zero=False)])
dss = xr.DataArray(t, dims=['time'], coords={"time":t})
# TWO VERSIONS OF "days":
days0 = dss['time'].dt.daysinmonth
days = xr.DataArray(dss['time'].dt.daysinmonth.data, dims=['time'], coords={'time':dss['time']}, attrs=days0.attrs, name='days_in_month')
print(f"IDENTICAL: {days.identical(days0)}")
year = dss['time'].dt.year.data
month = dss['time'].dt.month.data
# REPEAT SAME STEPS FOR days and days0:
days = days.assign_coords(year=("time", year), month=("time", month))
days = days.set_index(time=['year', 'month'])
days0 = days0.assign_coords(year=("time", year), month=("time", month))
days0 = days0.set_index(time=['year', 'month'])
print(f"IDENTICAL: {days.identical(days0)}")
days = days.unstack('time') # THIS WORKS
print(f"{days.dims = }")
#
days0 = days0.unstack('time') # THIS FAILS
print(f"{days0.dims = }")
My output:
python 3.12.0 | packaged by conda-forge | (main, Oct 3 2023, 08:36:57) [Clang 15.0.7 ]
numpy: 1.26.4, xarray: 2024.5.0, cftime: 1.6.3
IDENTICAL: True
IDENTICAL: True
days.dims = ('year', 'month')
...
ValueError: IndexVariable objects must be 1-dimensional
Does anyone have any idea what could be going on here?
Interesting. I'm having trouble tracking this down. I noticed it worked in an older env I had and it seems to stop working with v2023.5.0.
There was some work on the dt accessor released in that version.
I'd consider reporting it as a bug if you haven't already.
Thanks @Katelyn FitzGerald . I haven't reported it yet because I wasn't sure how to even describe it. I guess I'll just provide this example and see where it goes.
I think with the additional info about the version where it worked, your nice example here, and the trace/log, someone will be able to piece together what's going on :fingers_crossed: .
@Katelyn FitzGerald -- I did post this to Xarray's issues: https://github.com/pydata/xarray/issues/9190
It looks like the issue comes down to the difference between IndexVariable and Variable.
So it looks like dss.reset_index("time").dt.daysinmonth
addresses the issue with days0 by changing it from an IndexVariable to Variable and the the dt accessor respects that and returns a Variable as well so it doesn't fail later on.
It is interesting that this worked until relatively recently (I checked that it's still an IndexVariable in earlier versions).
Ah, it works in older versions because dt.daysinmonth was previously changing it back to a Variable from the IndexVariable.
Sorry I didn't catch this sooner.
Yeah. To me that was mysterious because I didn't even realize there was a variable
attribute or a distinction between Variable and IndexVariable. I was hoping that the identical
method would have told me about that, but I guess it doesn't check that.
Last updated: May 16 2025 at 17:14 UTC