Stream: python-questions

Topic: Efficient (dask) array indexing


view this post on Zulip Rich Neale (Jul 25 2022 at 16:05):

So I have a nlat x nlon x ndays array - quite large ~3Gb and I am trying paralyze a particular indexing operation
A bit tricky to explain, but at each lat/lon point I am indexing the time array to be days of the year days=(1,2,...365) and I am rewriting that to an an equivalent sized array that is a randomized version of the days_rand(45,365,...1,3) then my operation seems simple
The key think is days and days_rand changes for each ilat/ilon.

var_new(ilat,ilon,days) = var_old(ilat,ilon,days_rand)

I do this for a multi-year array and days and days_rand are precomputed.
Dask will not let me compute like this presumably because of this.
https://docs.dask.org/en/stable/array-slicing.html
And that I cannot slice with more than one dimension (ilat,ilon are still slices I'm presuming).

So I am then .load() -ing and operating on this with an ilat,ilon for loops.
So any ideas on parallelizing this either as dask or numpy?
Thanks!
Rich

view this post on Zulip Deepak Cherian (Jul 25 2022 at 16:17):

Does vindex work? https://docs.dask.org/en/stable/generated/dask.array.Array.vindex.html#dask.array.Array.vindex

If not I would chunk so that all timesteps are in one block; and then map_blocks your permutation.

view this post on Zulip Deepak Cherian (Jul 25 2022 at 16:19):

Oh this might work: https://numpy.org/doc/stable/reference/generated/numpy.take_along_axis.html#numpy.take_along_axis


Last updated: May 16 2025 at 17:14 UTC