So I have a nlat x nlon x ndays array - quite large ~3Gb and I am trying paralyze a particular indexing operation
A bit tricky to explain, but at each lat/lon point I am indexing the time array to be days of the year days=(1,2,...365) and I am rewriting that to an an equivalent sized array that is a randomized version of the days_rand(45,365,...1,3) then my operation seems simple
The key think is days and days_rand changes for each ilat/ilon.
var_new(ilat,ilon,days) = var_old(ilat,ilon,days_rand)
I do this for a multi-year array and days and days_rand are precomputed.
Dask will not let me compute like this presumably because of this.
https://docs.dask.org/en/stable/array-slicing.html
And that I cannot slice with more than one dimension (ilat,ilon are still slices I'm presuming).
So I am then .load() -ing and operating on this with an ilat,ilon for loops.
So any ideas on parallelizing this either as dask or numpy?
Thanks!
Rich
Does vindex
work? https://docs.dask.org/en/stable/generated/dask.array.Array.vindex.html#dask.array.Array.vindex
If not I would chunk so that all timesteps are in one block; and then map_blocks your permutation.
Oh this might work: https://numpy.org/doc/stable/reference/generated/numpy.take_along_axis.html#numpy.take_along_axis
Last updated: May 16 2025 at 17:14 UTC