Stream: dask
Topic: memory usage
Ufuk Turuncoglu (Dec 15 2020 at 21:42):
I am trying to write a script to process very-high resolution dataset (GHRSST) and when i instrument my code with memory_profiler i am seeing following statements adds extra memory consumption and i just wonder that is it possible to rewrite them to reduce the memory usage.
Statement 1:
97 # REPLACED: corner_pair_uniq = dd.from_dask_array(corner_pair).drop_duplicates().to_dask_array(lengths=True) 98 # following reduces memory by %17 99 258.629 MiB 0.680 MiB 1 corner_pair_uniq = dd.from_dask_array(corner_pair).drop_duplicates().values 100 1005.586 MiB 746.957 MiB 1 corner_pair_uniq.compute_chunk_sizes()
In this case i reduced the memory consumption by changing the calculation of corner_pair_uniq but there might be another way to reduce more.
Statement 2:
113 1005.586 MiB 0.000 MiB 5 corners = dd.concat([dd.from_dask_array(c) for c in [corner_lon.T.reshape((-1,)).T, corner_lat.T.reshape((-1,)).T]], axis=1) 114 1005.586 MiB 0.000 MiB 1 corners.columns = ['lon', 'lat'] 115 1789.883 MiB 784.297 MiB 1 elem_conn = corners.compute().groupby(['lon','lat'], sort=False).ngroup()+1 116 1692.887 MiB -96.996 MiB 1 elem_conn = da.from_array(elem_conn.to_numpy())
Calculation of elem_conn introduces another jump in the memory. Any suggestion?
Last updated: Jan 30 2022 at 12:01 UTC