Stream: python-questions

Topic: How to combine data from different calendars in xarray


view this post on Zulip Brian Bonnlander (Jun 16 2020 at 17:41):

The NA-CORDEX climate dataset has simulation runs with daily values from several different calendars (360-day, 365 day noleap, 365 with leap), that I would like to combine into a single xarray dataset. Is anyone familiar with an example of how to do this in xarray? If it's not too difficult to do, I hope that the combined dataset does not throw away values, but pads out missing days from one calendar or the other with NaN values. I have looked online for examples of how to do this, but have not found anything yet. Thanks in advance for any pointers.

view this post on Zulip Deepak Cherian (Jun 16 2020 at 18:10):

maybe something like

ds360 = ds360.assign(time=ds360.indexes["time"].asi8)
...
combined = xr.merge([ds360, ...])
combined.time.attrs["units"] = "microseconds since 1970-01-01"
combined = xr.decode_cf(combined)

basically you get everything on to a common reference axis. then xarray's automatic alignment will insert NaNs in the right place.

view this post on Zulip Brian Bonnlander (Jun 16 2020 at 18:15):

Thanks for the suggestion! I may not be able to try it right away, but my goal is to create an example notebook that demonstrates this.

view this post on Zulip Deepak Cherian (Jun 16 2020 at 18:27):

the best way might be to open an issue at cftime asking for a function to convert between calendars.

view this post on Zulip Kristen Thyng (Jun 17 2020 at 14:10):

I have found pandas time handling to almost always be able to do what I need. Have you used that much? It has the best timezone handling I have seen, too.

Something like,

import pandas as pd
pd.Timestamp([your time array])

to get into pandas and then combine from there. I can be more specific as needed but this is a starting point. I know pandas isn't xarray but clearly they play really well together.

view this post on Zulip Anderson Banihirwe (Jun 17 2020 at 14:18):

I have found pandas time handling to almost always be able to do what I need. Have you used that much? It has the best timezone handling I have seen, too.

Pandas datetime functionality works great, but unfortunately, it supports the proleptic Gregorian calendar only. As @Brian Bonnlander pointed out, he's dealing with some non-standard calendars ie. 360-day, 365 day noleap. cftime is likely going to be the only option

view this post on Zulip Kevin Paul (Jun 17 2020 at 14:20):

However, I think @Kristen Thyng makes a good point that many times (not always) you only need to know timestamps, and durations (i.e., distances between timestamps) are not needed. When you do not need to use durations, then pandas should be fine. If you need to use durations, then some smart handling of calendars is needed.

view this post on Zulip Kevin Paul (Jun 17 2020 at 14:22):

...Although some care probably needs to be taken to make sure that the timestamps generated for non-standard calendars actually make sense.

view this post on Zulip Matt Long (Jun 17 2020 at 14:22):

I am not sure it's a good idea to combine datasets with different calendars into a single xarray Datatset.

view this post on Zulip Kristen Thyng (Jun 17 2020 at 15:12):

However, I think Kristen Thyng makes a good point that many times (not always) you only need to know timestamps, and durations (i.e., distances between timestamps) are not needed. When you do not need to use durations, then pandas should be fine. If you need to use durations, then some smart handling of calendars is needed.

Yes I was thinking more like this --- if you know (or can recreate) the dates, you can still combine between calendars. I wasn't aware, though, that pandas only works with one calendar, I guess I only ever use one!


Last updated: Jan 30 2022 at 12:01 UTC