Stream: python-questions
Topic: Check time axis
David Bailey (Sep 10 2021 at 19:36):
Hey all,
Does anyone have a quick way of checking a time axis for bad/missing values? For example, say I expect a cftime object that is:
<xarray.DataArray 'time' (time: 420)>
array([cftime.DatetimeNoLeap(2035, 2, 1, 0, 0, 0, 0),
cftime.DatetimeNoLeap(2035, 3, 1, 0, 0, 0, 0),
cftime.DatetimeNoLeap(2035, 4, 1, 0, 0, 0, 0), ...,
cftime.DatetimeNoLeap(2069, 11, 1, 0, 0, 0, 0),
cftime.DatetimeNoLeap(2069, 12, 1, 0, 0, 0, 0),
cftime.DatetimeNoLeap(2070, 1, 1, 0, 0, 0, 0)], dtype=object)
I want to make sure that no years/days are skipped here.
David Bailey (Sep 10 2021 at 19:58):
Kind of answered my own question. If I read in the time axis, but with decode_times=False, then I can use
print(np.all(np.diff(time) > 0))
Then I can also check the length of the time element and there should be nyears12 or nyears365 monthly and daily times respectively. I think this should spot any issues.
Deepak Cherian (Sep 10 2021 at 21:06):
print(np.all(np.diff(time) > 0))
Hmm.. isn't this checking monotonicity not that no years or days are skipped :thinking:
Deepak Cherian (Sep 10 2021 at 21:07):
I think you could try xr.infer_freq(time)
if it returns None you have a problem.
David Bailey (Sep 13 2021 at 16:25):
I don't think this works. Here is the output:
DatetimeIndex(['2035-01-01', '2035-01-02', '2035-01-03', '2035-01-04',
'2035-01-05', '2035-01-06', '2035-01-07', '2035-01-08',
'2035-01-09', '2035-01-10',
...
'2069-12-22', '2069-12-23', '2069-12-24', '2069-12-25',
'2069-12-26', '2069-12-27', '2069-12-28', '2069-12-29',
'2069-12-30', '2069-12-31'],
dtype='datetime64[ns]', length=12775, freq=None)
Deepak Cherian (Sep 13 2021 at 16:47):
freq=None
I think this means you might have a problem.
import pandas as pd
pd.DatetimeIndex(data=ds.time.data, freq="infer")
Does this also set freq=None
? If so there might be an issue. You could try removing some roundoff error in secoinds with
ds["time"] = ds.time.dt.round("H")
which will discard sub-hourly information.
David Bailey (Sep 13 2021 at 18:11):
Ok. So, if I do this with the monthly-mean data it works fine and I get 'MS'. So, I believe the issue is the 'noleap' calendar with the daily mean. No February 29th. Is there a way to specify the 'noleap' calendar?
Deepak Cherian (Sep 13 2021 at 18:14):
you'll have to use CFTimeIndex
in that case. So I would convert ds["time"] = xr.CFTimeIndex(ds.time.data)
or specify use_cftime
in your open_Dataset
call. And then run infer_freq
David Bailey (Sep 13 2021 at 19:01):
I guess I am confused. There is actually a warning about this:
python timecheck.py
['/glade/campaign/cesm/development/wawg/WACCM6-TSMLT-GEO/SAI1/b.e21.BW.f09_g17.SSP245-TSMLT-GAUSS-DEFAULT.001/atm/proc/tseries/month_1/b.e21.BW.f09_g17.SSP245-TSMLT-GAUSS-DEFAULT.001.cam.h0.TREFHT.203501-206912.nc']
timecheck.py:18: RuntimeWarning: Converting a CFTimeIndex with dates from a non-standard calendar, 'noleap', to a pandas.DatetimeIndex, which uses dates from the standard calendar. This may lead to subtle errors in operations that depend on the length of time between dates.
Does infer_freq only work on datetimeindex objects? Or are you saying that CFTimeIndex works as well?
David Bailey (Sep 13 2021 at 19:03):
Ok. I think I understand. This code works:
ds = xr.open_mfdataset(files)
xr.CFTimeIndex(ds.time.data)
time = ds['time']
print(xr.infer_freq(time))
Deepak Cherian (Sep 13 2021 at 19:57):
Yes you can provide use_cftime=True
in open_mfdataset
to do this from the start. and avoid the second assignment to ds["time"]
Last updated: Jan 30 2022 at 12:01 UTC