Stream: python-questions
Topic: xr.concat: auto rechunking error
Matt Long (Jul 07 2021 at 19:59):
I am attempting to concatenate datasets along an existing dimension and getting the following error.
NotImplementedError: Can not use auto rechunking with object dtype. We are unable to estimate the size in bytes of object data
It's compaining about time, which has cftime objects—but all datasets are identically chunked. Is there away to avoid triggering the auto rechunking?
Matt Long (Jul 07 2021 at 20:15):
I've tracked this down to the time_bound
variable (dropping this resolves the problem). Still not entirely sure why...
Anderson Banihirwe (Jul 07 2021 at 20:17):
Since you are using the auto-rechunking
, I don't think there is any workaround other than dropping the time_bound
or eagerly loading the time_bound
before initiating the concatenation. E.g.
ds['time_bound'].load() xr.concat([.....], dim=...)
Matt Long (Jul 07 2021 at 20:17):
Where is auto-rechunking
coming in? All the datasets are identically chunked.
Matt Long (Jul 07 2021 at 20:18):
do I need to switch a dask option to turn off auto-rechunking?
Anderson Banihirwe (Jul 07 2021 at 20:27):
Where is auto-rechunking coming in?
I was under the impression that you were specifying chunks='auto'
somewhere in your workflow.
Not sure where the issue is coming from... The full traceback might be useful
Matt Long (Jul 07 2021 at 20:27):
nope
Matt Long (Jul 07 2021 at 20:28):
Here's the trace:
--------------------------------------------------------------------------- NotImplementedError Traceback (most recent call last) <timed exec> in <module> ~/codes/ocean-metabolisms/notebooks/data_collections.py in to_dataset_dict(self, variable, compute, clobber, prefer_derived, refine_query, **kwargs) 120 dsets[key] = xr.concat( 121 [ds for ds in ds_list], --> 122 dim='member_id', join='override', combine_attrs='override', 123 ) 124 #dsets[key]['time_bound'] = ds_list[0].time_bound /glade/work/mclong/miniconda3/envs/metabolic/lib/python3.7/site-packages/xarray/core/concat.py in concat(objs, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs) 237 ) 238 return f( --> 239 objs, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs 240 ) 241 /glade/work/mclong/miniconda3/envs/metabolic/lib/python3.7/site-packages/xarray/core/concat.py in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs) 506 except KeyError: 507 raise ValueError(f"{k!r} is not present in all datasets.") --> 508 combined = concat_vars(vars, dim, positions, combine_attrs=combine_attrs) 509 assert isinstance(combined, Variable) 510 result_vars[k] = combined /glade/work/mclong/miniconda3/envs/metabolic/lib/python3.7/site-packages/xarray/core/variable.py in concat(variables, dim, positions, shortcut, combine_attrs) 2860 return IndexVariable.concat(variables, dim, positions, shortcut, combine_attrs) 2861 else: -> 2862 return Variable.concat(variables, dim, positions, shortcut, combine_attrs) 2863 2864 /glade/work/mclong/miniconda3/envs/metabolic/lib/python3.7/site-packages/xarray/core/variable.py in concat(cls, variables, dim, positions, shortcut, combine_attrs) 1819 axis = first_var.get_axis_num(dim) 1820 dims = first_var.dims -> 1821 data = duck_array_ops.concatenate(arrays, axis=axis) 1822 if positions is not None: 1823 # TODO: deprecate this option -- we don't need it for groupby /glade/work/mclong/miniconda3/envs/metabolic/lib/python3.7/site-packages/xarray/core/duck_array_ops.py in concatenate(arrays, axis) 310 def concatenate(arrays, axis=0): 311 """concatenate() with better dtype promotion rules.""" --> 312 return _concatenate(as_shared_dtype(arrays), axis=axis) 313 314 /glade/work/mclong/miniconda3/envs/metabolic/lib/python3.7/site-packages/xarray/core/duck_array_ops.py in f(*args, **kwargs) 54 else: 55 wrapped = getattr(eager_module, name) ---> 56 return wrapped(*args, **kwargs) 57 58 else: /glade/work/mclong/miniconda3/envs/metabolic/lib/python3.7/site-packages/dask/array/core.py in concatenate(seq, axis, allow_unknown_chunksizes) 3454 from . import wrap 3455 -> 3456 seq = [asarray(a) for a in seq] 3457 3458 if not seq: /glade/work/mclong/miniconda3/envs/metabolic/lib/python3.7/site-packages/dask/array/core.py in <listcomp>(.0) 3454 from . import wrap 3455 -> 3456 seq = [asarray(a) for a in seq] 3457 3458 if not seq: /glade/work/mclong/miniconda3/envs/metabolic/lib/python3.7/site-packages/dask/array/core.py in asarray(a, **kwargs) 3726 elif not isinstance(getattr(a, "shape", None), Iterable): 3727 a = np.asarray(a) -> 3728 return from_array(a, getitem=getter_inline, **kwargs) 3729 3730 /glade/work/mclong/miniconda3/envs/metabolic/lib/python3.7/site-packages/dask/array/core.py in from_array(x, chunks, name, lock, asarray, fancy, getitem, meta) 2761 2762 chunks = normalize_chunks( -> 2763 chunks, x.shape, dtype=x.dtype, previous_chunks=previous_chunks 2764 ) 2765 /glade/work/mclong/miniconda3/envs/metabolic/lib/python3.7/site-packages/dask/array/core.py in normalize_chunks(chunks, shape, limit, dtype, previous_chunks) 2467 2468 if any(c == "auto" for c in chunks): -> 2469 chunks = auto_chunks(chunks, shape, limit, dtype, previous_chunks) 2470 2471 if shape is not None: /glade/work/mclong/miniconda3/envs/metabolic/lib/python3.7/site-packages/dask/array/core.py in auto_chunks(chunks, shape, limit, dtype, previous_chunks) 2551 if dtype.hasobject: 2552 raise NotImplementedError( -> 2553 "Can not use auto rechunking with object dtype. " 2554 "We are unable to estimate the size in bytes of object data" 2555 ) NotImplementedError: Can not use auto rechunking with object dtype. We are unable to estimate the size in bytes of object data
Last updated: Jan 30 2022 at 12:01 UTC