Stream: dask

Topic: memory usage


view this post on Zulip Ufuk Turuncoglu (Dec 15 2020 at 21:42):

I am trying to write a script to process very-high resolution dataset (GHRSST) and when i instrument my code with memory_profiler i am seeing following statements adds extra memory consumption and i just wonder that is it possible to rewrite them to reduce the memory usage.

Statement 1:

   97                                             # REPLACED: corner_pair_uniq = dd.from_dask_array(corner_pair).drop_duplicates().to_dask_array(lengths=True)
    98                                             # following reduces memory by %17
    99  258.629 MiB    0.680 MiB           1       corner_pair_uniq = dd.from_dask_array(corner_pair).drop_duplicates().values
   100 1005.586 MiB  746.957 MiB           1       corner_pair_uniq.compute_chunk_sizes()

In this case i reduced the memory consumption by changing the calculation of corner_pair_uniq but there might be another way to reduce more.

Statement 2:

   113 1005.586 MiB    0.000 MiB           5       corners = dd.concat([dd.from_dask_array(c) for c in [corner_lon.T.reshape((-1,)).T, corner_lat.T.reshape((-1,)).T]], axis=1)
   114 1005.586 MiB    0.000 MiB           1       corners.columns = ['lon', 'lat']
   115 1789.883 MiB  784.297 MiB           1       elem_conn = corners.compute().groupby(['lon','lat'], sort=False).ngroup()+1
   116 1692.887 MiB  -96.996 MiB           1       elem_conn = da.from_array(elem_conn.to_numpy())

Calculation of elem_conn introduces another jump in the memory. Any suggestion?


Last updated: Jan 30 2022 at 12:01 UTC