Stream: python-questions

Topic: Check time axis


view this post on Zulip David Bailey (Sep 10 2021 at 19:36):

Hey all,

Does anyone have a quick way of checking a time axis for bad/missing values? For example, say I expect a cftime object that is:

<xarray.DataArray 'time' (time: 420)>
array([cftime.DatetimeNoLeap(2035, 2, 1, 0, 0, 0, 0),
cftime.DatetimeNoLeap(2035, 3, 1, 0, 0, 0, 0),
cftime.DatetimeNoLeap(2035, 4, 1, 0, 0, 0, 0), ...,
cftime.DatetimeNoLeap(2069, 11, 1, 0, 0, 0, 0),
cftime.DatetimeNoLeap(2069, 12, 1, 0, 0, 0, 0),
cftime.DatetimeNoLeap(2070, 1, 1, 0, 0, 0, 0)], dtype=object)

I want to make sure that no years/days are skipped here.

view this post on Zulip David Bailey (Sep 10 2021 at 19:58):

Kind of answered my own question. If I read in the time axis, but with decode_times=False, then I can use

print(np.all(np.diff(time) > 0))

Then I can also check the length of the time element and there should be nyears12 or nyears365 monthly and daily times respectively. I think this should spot any issues.

view this post on Zulip Deepak Cherian (Sep 10 2021 at 21:06):

print(np.all(np.diff(time) > 0))

Hmm.. isn't this checking monotonicity not that no years or days are skipped :thinking:

view this post on Zulip Deepak Cherian (Sep 10 2021 at 21:07):

I think you could try xr.infer_freq(time) if it returns None you have a problem.

view this post on Zulip David Bailey (Sep 13 2021 at 16:25):

I don't think this works. Here is the output:

DatetimeIndex(['2035-01-01', '2035-01-02', '2035-01-03', '2035-01-04',
'2035-01-05', '2035-01-06', '2035-01-07', '2035-01-08',
'2035-01-09', '2035-01-10',
...
'2069-12-22', '2069-12-23', '2069-12-24', '2069-12-25',
'2069-12-26', '2069-12-27', '2069-12-28', '2069-12-29',
'2069-12-30', '2069-12-31'],
dtype='datetime64[ns]', length=12775, freq=None)

view this post on Zulip Deepak Cherian (Sep 13 2021 at 16:47):

freq=None

I think this means you might have a problem.

import pandas as pd

pd.DatetimeIndex(data=ds.time.data, freq="infer")

Does this also set freq=None? If so there might be an issue. You could try removing some roundoff error in secoinds with

ds["time"] = ds.time.dt.round("H") which will discard sub-hourly information.

view this post on Zulip David Bailey (Sep 13 2021 at 18:11):

Ok. So, if I do this with the monthly-mean data it works fine and I get 'MS'. So, I believe the issue is the 'noleap' calendar with the daily mean. No February 29th. Is there a way to specify the 'noleap' calendar?

view this post on Zulip Deepak Cherian (Sep 13 2021 at 18:14):

you'll have to use CFTimeIndex in that case. So I would convert ds["time"] = xr.CFTimeIndex(ds.time.data) or specify use_cftime in your open_Dataset call. And then run infer_freq

view this post on Zulip David Bailey (Sep 13 2021 at 19:01):

I guess I am confused. There is actually a warning about this:

python timecheck.py
['/glade/campaign/cesm/development/wawg/WACCM6-TSMLT-GEO/SAI1/b.e21.BW.f09_g17.SSP245-TSMLT-GAUSS-DEFAULT.001/atm/proc/tseries/month_1/b.e21.BW.f09_g17.SSP245-TSMLT-GAUSS-DEFAULT.001.cam.h0.TREFHT.203501-206912.nc']
timecheck.py:18: RuntimeWarning: Converting a CFTimeIndex with dates from a non-standard calendar, 'noleap', to a pandas.DatetimeIndex, which uses dates from the standard calendar. This may lead to subtle errors in operations that depend on the length of time between dates.

Does infer_freq only work on datetimeindex objects? Or are you saying that CFTimeIndex works as well?

view this post on Zulip David Bailey (Sep 13 2021 at 19:03):

Ok. I think I understand. This code works:

ds = xr.open_mfdataset(files)

xr.CFTimeIndex(ds.time.data)
time = ds['time']

print(xr.infer_freq(time))

view this post on Zulip Deepak Cherian (Sep 13 2021 at 19:57):

Yes you can provide use_cftime=True in open_mfdataset to do this from the start. and avoid the second assignment to ds["time"]


Last updated: Jan 30 2022 at 12:01 UTC