Stream: python-questions

Topic: xr.concat: auto rechunking error


view this post on Zulip Matt Long (Jul 07 2021 at 19:59):

I am attempting to concatenate datasets along an existing dimension and getting the following error.

NotImplementedError: Can not use auto rechunking with object dtype. We are unable to estimate the size in bytes of object data

It's compaining about time, which has cftime objects—but all datasets are identically chunked. Is there away to avoid triggering the auto rechunking?

view this post on Zulip Matt Long (Jul 07 2021 at 20:15):

I've tracked this down to the time_bound variable (dropping this resolves the problem). Still not entirely sure why...

view this post on Zulip Anderson Banihirwe (Jul 07 2021 at 20:17):

Since you are using the auto-rechunking, I don't think there is any workaround other than dropping the time_bound or eagerly loading the time_bound before initiating the concatenation. E.g.

ds['time_bound'].load()
xr.concat([.....], dim=...)

view this post on Zulip Matt Long (Jul 07 2021 at 20:17):

Where is auto-rechunking coming in? All the datasets are identically chunked.

view this post on Zulip Matt Long (Jul 07 2021 at 20:18):

do I need to switch a dask option to turn off auto-rechunking?

view this post on Zulip Anderson Banihirwe (Jul 07 2021 at 20:27):

Where is auto-rechunking coming in?

I was under the impression that you were specifying chunks='auto' somewhere in your workflow.

Not sure where the issue is coming from... The full traceback might be useful

view this post on Zulip Matt Long (Jul 07 2021 at 20:27):

nope

view this post on Zulip Matt Long (Jul 07 2021 at 20:28):

Here's the trace:

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<timed exec> in <module>

~/codes/ocean-metabolisms/notebooks/data_collections.py in to_dataset_dict(self, variable, compute, clobber, prefer_derived, refine_query, **kwargs)
    120             dsets[key] = xr.concat(
    121                 [ds for ds in ds_list],
--> 122                 dim='member_id', join='override', combine_attrs='override',
    123             )
    124             #dsets[key]['time_bound'] = ds_list[0].time_bound

/glade/work/mclong/miniconda3/envs/metabolic/lib/python3.7/site-packages/xarray/core/concat.py in concat(objs, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs)
    237         )
    238     return f(
--> 239         objs, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs
    240     )
    241

/glade/work/mclong/miniconda3/envs/metabolic/lib/python3.7/site-packages/xarray/core/concat.py in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs)
    506             except KeyError:
    507                 raise ValueError(f"{k!r} is not present in all datasets.")
--> 508             combined = concat_vars(vars, dim, positions, combine_attrs=combine_attrs)
    509             assert isinstance(combined, Variable)
    510             result_vars[k] = combined

/glade/work/mclong/miniconda3/envs/metabolic/lib/python3.7/site-packages/xarray/core/variable.py in concat(variables, dim, positions, shortcut, combine_attrs)
   2860         return IndexVariable.concat(variables, dim, positions, shortcut, combine_attrs)
   2861     else:
-> 2862         return Variable.concat(variables, dim, positions, shortcut, combine_attrs)
   2863
   2864

/glade/work/mclong/miniconda3/envs/metabolic/lib/python3.7/site-packages/xarray/core/variable.py in concat(cls, variables, dim, positions, shortcut, combine_attrs)
   1819             axis = first_var.get_axis_num(dim)
   1820             dims = first_var.dims
-> 1821             data = duck_array_ops.concatenate(arrays, axis=axis)
   1822             if positions is not None:
   1823                 # TODO: deprecate this option -- we don't need it for groupby

/glade/work/mclong/miniconda3/envs/metabolic/lib/python3.7/site-packages/xarray/core/duck_array_ops.py in concatenate(arrays, axis)
    310 def concatenate(arrays, axis=0):
    311     """concatenate() with better dtype promotion rules."""
--> 312     return _concatenate(as_shared_dtype(arrays), axis=axis)
    313
    314

/glade/work/mclong/miniconda3/envs/metabolic/lib/python3.7/site-packages/xarray/core/duck_array_ops.py in f(*args, **kwargs)
     54             else:
     55                 wrapped = getattr(eager_module, name)
---> 56             return wrapped(*args, **kwargs)
     57
     58     else:

/glade/work/mclong/miniconda3/envs/metabolic/lib/python3.7/site-packages/dask/array/core.py in concatenate(seq, axis, allow_unknown_chunksizes)
   3454     from . import wrap
   3455
-> 3456     seq = [asarray(a) for a in seq]
   3457
   3458     if not seq:

/glade/work/mclong/miniconda3/envs/metabolic/lib/python3.7/site-packages/dask/array/core.py in <listcomp>(.0)
   3454     from . import wrap
   3455
-> 3456     seq = [asarray(a) for a in seq]
   3457
   3458     if not seq:

/glade/work/mclong/miniconda3/envs/metabolic/lib/python3.7/site-packages/dask/array/core.py in asarray(a, **kwargs)
   3726     elif not isinstance(getattr(a, "shape", None), Iterable):
   3727         a = np.asarray(a)
-> 3728     return from_array(a, getitem=getter_inline, **kwargs)
   3729
   3730

/glade/work/mclong/miniconda3/envs/metabolic/lib/python3.7/site-packages/dask/array/core.py in from_array(x, chunks, name, lock, asarray, fancy, getitem, meta)
   2761
   2762     chunks = normalize_chunks(
-> 2763         chunks, x.shape, dtype=x.dtype, previous_chunks=previous_chunks
   2764     )
   2765

/glade/work/mclong/miniconda3/envs/metabolic/lib/python3.7/site-packages/dask/array/core.py in normalize_chunks(chunks, shape, limit, dtype, previous_chunks)
   2467
   2468     if any(c == "auto" for c in chunks):
-> 2469         chunks = auto_chunks(chunks, shape, limit, dtype, previous_chunks)
   2470
   2471     if shape is not None:

/glade/work/mclong/miniconda3/envs/metabolic/lib/python3.7/site-packages/dask/array/core.py in auto_chunks(chunks, shape, limit, dtype, previous_chunks)
   2551     if dtype.hasobject:
   2552         raise NotImplementedError(
-> 2553             "Can not use auto rechunking with object dtype. "
   2554             "We are unable to estimate the size in bytes of object data"
   2555         )

NotImplementedError: Can not use auto rechunking with object dtype. We are unable to estimate the size in bytes of object data

Last updated: Jan 30 2022 at 12:01 UTC