I am attempting to parallelize three calls to pcolormesh
using the Python multiprocessing
module. Here is some pseduo-code:
import multiprocessing
def setup_ax(label):
ax = plt.subplot(111,projection=ccrs.LambertConformal(central_longitude=-97.5,central_latitude=38.5),label=idstr)
ax.add_feature(cfeature.COASTLINE.with_scale('50m'), linewidth=0.5)
ax.add_feature(cfeature.STATES, linewidth=0.5)
ax.add_feature(cfeature.BORDERS, linewidth=0.5)
return(ax)
def plot_comp(ds,ax,field,minval,cmap,norm):
p = ax.pcolormesh(ds.lon0,ds.lat0,ds[field].max(dim='z0').where(ds[field].max(dim='z0')>minval),transform=ccrs.PlateCarree(),cmap=cmap,norm=norm)
return(p)
fig = plt.figure(1,figsize=(22,15))
ax1 = setup_ax('ax1')
ax2 = setup_ax('ax2')
ax3 = setup_ax('ax3')
ax = [ax1,ax2,ax3]
fn = ['f1','f2','f3']
mv = [0.0,0.0,0.0]
cm = [col1,col2,col3]
nm = [norm1,norm2,norm3]
mp = multiprocessing.Pool(max(multiprocessing.cpu_count()-2,1))
results = mp.starmap(plot_comp,[(ds,a,f,m,c,n) for a,f,m,c,n in tuple(zip(ax,fn,mv,cm,nm))])
I get an error that I am having trouble interpreting:
Traceback (most recent call last):
File "plot_mdv64_field.py", line 136, in <module>
results = mp.starmap(plot_comp,[(fileData,a,f,m,c,n) for a,f,m,c,n in tuple(zip(ax,fn,mv,cm,nm))])
File "/home/dadriaan/.conda/envs/icicle/lib/python3.8/multiprocessing/pool.py", line 372, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "/home/dadriaan/.conda/envs/icicle/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '[<matplotlib.collections.QuadMesh object at 0x7f7c5879ba90>]'. Reason: 'AttributeError("Can't pickle local object 'GeoAxes._pcolormesh_patched.<locals>.<lambda>'")'
I can only imagine this has something to do with figures or subplots, but I'm not quite sure in what way. I would expect results
to just be a list of three objects returned from pcolormesh
in plot_comp()
, but I must be missing something.
Does anyone have any insight? Thank you!
@Daniel Adriaansen, I am not exactly sure what's going wrong, but in previous instances I've found that matplotlib
is not thread-safe. I have used dask.delayed
to successfully parallelize plotting. For example, see here.
@Daniel Adriaansen: @Matt Long's suggestion of using Dask for parallelism is a good one. Can I ask if this is "new code" that you have written and it is failing? Or is this an old script that used to work but now does not?
Thank you @Matt Long and @Kevin Paul! To answer Kevin's question- this is new code that I wrote and is failing. Are you mostly curious from a version standpoint (i.e. new versions breaking my old code)? Or is there something else that might be at play here?
I have not used Dask before, and frankly misunderstood it as only useful for parallelizing problems with specific Pythonic data containers/objects like ndarray/DataArray and DataFrames. It turns out that Dask can be used by itself and this opens up a whole new world. From the Dask documentation here https://examples.dask.org/delayed.html, I see "Systems like Dask.dataframe are built with Dask.delayed. If you have a problem that is paralellizable, but isn’t as simple as just a big array or a big dataframe, then dask.delayed may be the right choice for you."
Thanks to @Matt Long for the example and suggestion!
Hi Daniel, note that matplotlib is constrained to have one process/thread produce the plot itself. Dask is best used to parallelize the data processing steps for the plot, but the process of constructing the plot itself cannot be easily parallelized.
Thanks @Brian Bonnlander. @Matt Long example above shows using dask.delayed for constructing the plot itself, presumably in parallel. Again, I am very new to Dask so I may not fully grasp what is going on in the example. Is using dask.delayed useful for calls to things like contourf
and pcolormesh
?
Hi Daniel, I just looked at the example and it seems related to data processing, not plotting. I could be wrong, but everything I've read suggests that contourf
and pcolormesh
are non-parallizable. Producing the data values for these plots are parallelizable using Dask, however.
OK thanks- I think I understand. What I am trying to do is get a single Python script to call three instances of pcolormesh
simultaneously for three different plots (not the same plot). This would be equivalent to running three separate python scripts at the same time to call pcolormesh
. Separate resources are used for each, but I just want to do it from a single script. I'm not actually interesting in parallellizing the work that's done within pcolormesh
, but rather running multiple calls to those simultaneously on the same piece of hardware.
Ah, I see. It may be possible to do that, as long as the plots are completely distinct and not combined as separate subplots. Again though, this is more based on what I've read.
After much tinkering (including with dask
a bit), I was ultimately successful with my original approach using multiprocessing
:
import multiprocessing
def setup_ax(label):
ax = plt.subplot(111,projection=ccrs.LambertConformal(central_longitude=-97.5,central_latitude=38.5),label=idstr)
ax.add_feature(cfeature.COASTLINE.with_scale('50m'), linewidth=0.5)
ax.add_feature(cfeature.STATES, linewidth=0.5)
ax.add_feature(cfeature.BORDERS, linewidth=0.5)
return(ax)
def plot_comp(ds,field,minval,cmap,norm):
ax = setup_ax(field)
p = ax.pcolormesh(ds.lon0,ds.lat0,ds[field].max(dim='z0').where(ds[field].max(dim='z0')>minval),transform=ccrs.PlateCarree(),cmap=cmap,norm=norm)
fig.savefig(fname+'.png')
# New figure
fig = plt.figure(1,figsize=(22,15))
# Items for parallelizing
fn = ['f1','f2','f3']
mv = [0.0,0.0,0.0]
cm = [col1,col2,col3]
nm = [norm1,norm2,norm3]
mp = multiprocessing.Pool(max(multiprocessing.cpu_count()-2,1))
mp.starmap(plot_comp,[(ds[fn],f,m,c,n) for f,m,c,n in tuple(zip(fn,mv,cm,nm))])
The major change I think was defining a new axis and saving the figure within plot_comp()
. Thus, nothing is returned from multiprocessing
in this instance. Timing within Python shows roughly a 40-50% speedup taking this approach for three fields.
Nice @Daniel Adriaansen . This would make a great blogpost for the ESDS blog if you're up for contributing: https://ncar.github.io/esds/
Deepak Cherian said:
Nice Daniel Adriaansen . This would make a great blogpost for the ESDS blog if you're up for contributing: https://ncar.github.io/esds/
Thanks for the opportunity! I'd be happy to contribute this as an example. What's the best way to coordinate? Feel free to email me directly (I believe my email is visible in my profile, but if not reply here and I will send it directly).
Daniel Adriaansen said:
What's the best way to coordinate?
A pull request here would be best: https://github.com/NCAR/esds .
Last updated: May 16 2025 at 17:14 UTC