Stream: python-questions
Topic: How to combine data from different calendars in xarray
Brian Bonnlander (Jun 16 2020 at 17:41):
The NA-CORDEX climate dataset has simulation runs with daily values from several different calendars (360-day, 365 day noleap, 365 with leap), that I would like to combine into a single xarray dataset. Is anyone familiar with an example of how to do this in xarray? If it's not too difficult to do, I hope that the combined dataset does not throw away values, but pads out missing days from one calendar or the other with NaN values. I have looked online for examples of how to do this, but have not found anything yet. Thanks in advance for any pointers.
Deepak Cherian (Jun 16 2020 at 18:10):
maybe something like
ds360 = ds360.assign(time=ds360.indexes["time"].asi8) ... combined = xr.merge([ds360, ...]) combined.time.attrs["units"] = "microseconds since 1970-01-01" combined = xr.decode_cf(combined)
basically you get everything on to a common reference axis. then xarray's automatic alignment will insert NaNs in the right place.
Brian Bonnlander (Jun 16 2020 at 18:15):
Thanks for the suggestion! I may not be able to try it right away, but my goal is to create an example notebook that demonstrates this.
Deepak Cherian (Jun 16 2020 at 18:27):
the best way might be to open an issue at cftime
asking for a function to convert between calendars.
Kristen Thyng (Jun 17 2020 at 14:10):
I have found pandas time handling to almost always be able to do what I need. Have you used that much? It has the best timezone handling I have seen, too.
Something like,
import pandas as pd
pd.Timestamp([your time array])
to get into pandas and then combine from there. I can be more specific as needed but this is a starting point. I know pandas isn't xarray but clearly they play really well together.
Anderson Banihirwe (Jun 17 2020 at 14:18):
I have found pandas time handling to almost always be able to do what I need. Have you used that much? It has the best timezone handling I have seen, too.
Pandas datetime functionality works great, but unfortunately, it supports the proleptic Gregorian calendar only. As @Brian Bonnlander pointed out, he's dealing with some non-standard calendars ie. 360-day
, 365 day noleap
. cftime is likely going to be the only option
Kevin Paul (Jun 17 2020 at 14:20):
However, I think @Kristen Thyng makes a good point that many times (not always) you only need to know timestamps, and durations (i.e., distances between timestamps) are not needed. When you do not need to use durations, then pandas should be fine. If you need to use durations, then some smart handling of calendars is needed.
Kevin Paul (Jun 17 2020 at 14:22):
...Although some care probably needs to be taken to make sure that the timestamps generated for non-standard calendars actually make sense.
Matt Long (Jun 17 2020 at 14:22):
I am not sure it's a good idea to combine datasets with different calendars into a single xarray Datatset.
Kristen Thyng (Jun 17 2020 at 15:12):
However, I think Kristen Thyng makes a good point that many times (not always) you only need to know timestamps, and durations (i.e., distances between timestamps) are not needed. When you do not need to use durations, then pandas should be fine. If you need to use durations, then some smart handling of calendars is needed.
Yes I was thinking more like this --- if you know (or can recreate) the dates, you can still combine between calendars. I wasn't aware, though, that pandas only works with one calendar, I guess I only ever use one!
Last updated: Jan 30 2022 at 12:01 UTC