Stream: jupyterlab-hub

Topic: IOPub data rate exceeded error


view this post on Zulip Emma Hoffman (Jul 16 2024 at 17:28):

I'm getting this error when trying to plot a figure. I'm running lines of plot that work with other data, and as far as I can tell just my xarray dataset I regridded with xesmf throws this. Searching the internet for solutions suggested that I do what the error message suggests and make a jupyter notebook config file to raise the data rate limit, but I'm not sure what that does, exactly, or if I could, so am hesitant to try. Here's the error message:

msgpack.exceptions.ExtraData: unpack(b) received extra data.
IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)

and the single line of plotting that caused the problem:

dsgr.oxy.isel(z_t=0).plot()

edit: the error also pops up when I run dsgr.compute() so the regridded data array must be too big, will try smaller chunks in dask

view this post on Zulip Emma Hoffman (Jul 17 2024 at 23:16):

Alright, well, it doesn't seem to matter the chunk size, I'm regridding a single selected variable that's already been meaned across time, so its pretty reduced with 60 depths going from lat/lon of 145, 360 to 384, 320. The dask and xesmf github issues I'm reading through seem like this shouldn't be a problem, and my attempts to generate and change a config file to increase the rate limit manually don't seem to work. Not sure what else to try.

view this post on Zulip Michael Levy (Jul 18 2024 at 03:26):

Emma Hoffman said:

as far as I can tell just my xarray dataset I regridded with xesmf throws this

combined with

edit: the error also pops up when I run dsgr.compute() so the regridded data array must be too big, will try smaller chunks in dask

Makes me wonder if you're running out of memory in the regrid step. Have you remapped other variables from this 145x360 grid to the 384x320 grid? Another possibility would be running out of memory in the time-mean step: are you reading in multiple time levels and computing the time mean in your notebook? dask will do that lazily, so it might be good to try to do the .compute() on the source grid data before remapping it (how many time levels are you reading in, and what dask resources are you requesting to parallelize the computation?)

view this post on Zulip Emma Hoffman (Jul 18 2024 at 03:49):

@Michael Levy
I did run compute on the time mean and as a netcdf file, then read that back in to use in the regridding, so I know it's not coming from that step, at least. I just tried on temperature instead of my desired oxygen and the regridding worked fine, both compute() and plot() ran without errors and a plot spit out. Hmm.

For dask, I was asking for 1 node, for 20 minutes, with 100GB and 30 workers. The dashboard never even started to load though, it didn't overload and kill workers, or anything, as far as I can tell.

view this post on Zulip Emma Hoffman (Jul 18 2024 at 04:03):

Just tried oxygen again, and it seems to work? Plots are loading, at least, but when I tried subtracting the regridded data from the data originally on that grid, which is the original goal, and now I'm getting the IOPub data rate error when I run compute or plot on that. No idea what's going on.

view this post on Zulip Michael Levy (Jul 18 2024 at 14:36):

Weird. Can you dump the regridded oxygen to netCDF and then read it back in? It definitely seems like a memory / resource limitation, but given your description there isn't really a resource-limited step. If you point me to your notebook, I'm happy to take a look and try to run it myself

view this post on Zulip Emma Hoffman (Jul 18 2024 at 15:42):

@Michael Levy
Thank you so much for your help. This is the path to the notebook I'm running:
u/home/emmah/1_Variability_Ventilation_O2/exploration_setup.ipynb

when I tried to save the regridded file it failed to serialize with a "0-dim memory has no length" type error.

view this post on Zulip Michael Levy (Jul 18 2024 at 22:29):

I started to poke around this afternoon, but couldn't find a conda environment to run it in (I tried cloning Yassir's TPAC_CO2 environment but that's missing geopy). What environment are you using?

view this post on Zulip Emma Hoffman (Jul 18 2024 at 22:46):

so sorry about that, I just updated the environment folder in the 1_Variability_Ventilation_O2 folder

view this post on Zulip Michael Levy (Jul 19 2024 at 15:32):

I was able to install your environment and play with your notebook some. I don't have any definitive answers, but I do have a few suggestions:

  1. 10 minutes is pretty short for keeping your workers around - I would request 30 minutes (C=CLSTR(1,"00:30",100,30))
  2. The file /glade/derecho/scratch/yeddebba/FOSI/LR/g.e22.GOMIPECOIAF_JRA-1p4-2018.TL319_g17.4p2z.002branch.pop.h.O2.195801-202112.nc is a netCDF4 file, and it is already written in chunks - each time level is its own chunk:
$ ncdump -hs /glade/derecho/scratch/yeddebba/FOSI/LR/g.e22.GOMIPECOIAF_JRA-1p4-2018.TL319_g17.4p2z.002branch.pop.h.O2.195801-202112.nc | grep O2
netcdf g.e22.GOMIPECOIAF_JRA-1p4-2018.TL319_g17.4p2z.002branch.pop.h.O2.195801-202112 {
    float O2(time, z_t, nlat, nlon) ;
        O2:long_name = "Dissolved Oxygen" ;
        O2:units = "mmol/m^3" ;
        O2:coordinates = "TLONG TLAT z_t " ;
        O2:grid_loc = "3111" ;
        O2:cell_methods = "time: mean" ;
        O2:_FillValue = 9.96921e+36f ;
        O2:missing_value = 9.96921e+36f ;
        O2:_Storage = "chunked" ;
        O2:_ChunkSizes = 1, 60, 384, 320 ;
        O2:_Shuffle = "true" ;
        O2:_DeflateLevel = 1 ;
        O2:_Endianness = "little" ;
        :history = "Wed Jun 21 08:40:54 2023: ncks -O -4 -L 1 /glade/scratch/kristenk/archive/g.e22.GOMIPECOIAF_JRA-1p4-2018.TL319_g17.4p2z.002branch/ocn/proc/tseries/month_1/g.e22.GOMIPECOIAF_JRA-1p4-2018.TL319_g17.4p2z.002branch.pop.h.O2.195801-202112.nc /glade/scratch/kristenk/archive/g.e22.GOMIPECOIAF_JRA-1p4-2018.TL319_g17.4p2z.002branch/ocn/proc/tseries/month_1/g.e22.GOMIPECOIAF_JRA-1p4-2018.TL319_g17.4p2z.002branch.pop.h.O2.195801-202112.nc\nnone" ;

I think that changing from chunks={'nlon':60,'nlat':60,'z_t':30} to chunks={'time':4} will avoid some re-chunking. I know the goal is to average over all the time values, so it seems beneficial to have each chunk contain all the time levels, but in my experience rechunking tends to be expensive and should be done sparingly.

Also, it might be worthwhile to create your own copy of /glade/work/yeddebba/GOBAI_O2/GOBAI-O2-v2.1.nc with the longitudes sorted correctly -- that would be more efficient than using sortby every time you read the file . Alternatively (or additionally), you could also read each file once and then do subsetting from memory -- e.g. your initial plotting cell could change to

dircsl='/glade/derecho/scratch/yeddebba/FOSI/LR/'
dsl=xr.open_mfdataset(dircsl+'*.O2.*.nc')
-dsl_150= xr.open_mfdataset(dircsl+'*.O2.*.nc',chunks={'z_t':20,'time':100}).sel(z_t=15000,method='nearest')
-dsl_300= xr.open_mfdataset(dircsl+'*.O2.*.nc',chunks={'z_t':20,'time':100}).sel(z_t=30000,method='nearest')
-dsl_500= xr.open_mfdataset(dircsl+'*.O2.*.nc',chunks={'z_t':20,'time':100}).sel(z_t=50000,method='nearest')
+dsl_150=dsl.sel(z_t=15000,method='nearest')
+dsl_300=dsl.sel(z_t=30000,method='nearest')
+dsl_500=dsl.sel(z_t=50000,method='nearest')

Or maybe dsl is already in memory from earlier in the notebook. If you want to chat about it, I'm happy to hop on a zoom call or google meet this afternoon (sometime between 2p and 4p Mountain Time?) or next week.


Last updated: May 16 2025 at 17:14 UTC