Stream: jupyterlab-hub

Topic: environment variables on jupyterhub


view this post on Zulip Allison Baker (Jan 25 2022 at 18:09):

I am using a specific version of netcdf that includes hdf5 ffilter plugins for compression on Cheyenne. I have to set the plugin path as follows:
setenv HDF5_PLUGIN_PATH /glade/work/haiyingx/hdf/h5pl-1.12.1-Linux/HDF_Group/HDF5/1.12.1/lib/plugin/
Now when I run a python session directly on Cheyenne, for example, as long as this path is set, then the netCDF4-python package can read my netcdf file with the filter. So yay for that.

But now I would like to be able to do this on jupyterhub. I have tried setting the plugin path like this in my notebook:
os.environ["HDF5_PLUGIN_PATH"] = "/glade/work/haiyingx/hdf/h5pl-1.12.1-Linux/HDF_Group/HDF5/1.12.1/lib/plugin/"
but the netCDF4 python does not seem to get this info as it can't find the filter definitions (i.e., I get the message that you get if you forget to set this filter path). I have never messed with my environment variables in jupyterhub before, so I must be missing some concept on how this works. Any help is appreciated!

view this post on Zulip Jared Baker (Jan 25 2022 at 18:28):

You can probably do this a number of ways, but ultimately the startup for Jupyter sessions from JupyterHub is done in bash and often picks up things in either ~/.bashrc or ~/.bash_profile depending on how one has structured their dot files. Alternatively, there is an optional ~/.jupyterhub file (that must use bash format) that could be used, but will require you to restart your JupyterHub server before it takes effect:

image.png

Might be required depending on how you're interacting with NetCDF I suppose.

view this post on Zulip Allison Baker (Jan 25 2022 at 18:51):

Thanks - I'll try this...

view this post on Zulip Allison Baker (Jan 25 2022 at 20:46):

Thanks @Jared Baker - both of those options work (creating a .jupyterhub file or adding to my .bashrc)

view this post on Zulip Allison Baker (Jan 26 2022 at 19:31):

A follow-up question. I now have all my codes working with the netcdf filters, unless I use DASK :)

I am using dask-jobqueue via ncar_jobqueue, and I think the problem is that I somehow have to give this HDF5_PLUGIN_PATH environment variable to the dask cluster. I looked at the dask-jobqueue instructions, and it is not clear to me how to do this. My best guess is that I need to do something with the dask/jobqueue.yml file ....
Thanks!

This is how I start the cluster:

from dask.distributed import Client
from ncar_jobqueue import NCARCluster
cluster = NCARCluster(project='NTDD0004')
cluster.adapt(minimum_jobs=1, maximum_jobs=30)
client = Client(cluster)

view this post on Zulip Anderson Banihirwe (Jan 26 2022 at 19:36):

I believe you can use the env_extra for this:

cluster = NCARCluster(project='NTDD0004', env_extra=['export HDF5_PLUGIN_PATH="....."'])

view this post on Zulip Allison Baker (Jan 26 2022 at 22:06):

That fixed the problem - thanks so much!


Last updated: May 16 2025 at 17:14 UTC