Stream: jupyter

Topic: slow pip install


view this post on Zulip Riley Brady (May 06 2020 at 17:12):

This is more of a general question for Cheyenne/Casper rather than jupyter specific. Does anyone have a sense of what causes such sluggish pip install times sometimes? I use conda for my environment, but when developing e.g. project-specific packages of code, I need to run pip install . --upgrade on the main package directory.

Some days this seems to be instantaneous. Other days it will take minutes to run. Any thoughts on this in general or how to speed it up? Today is a day that it's taking minutes. It seems to happen on the login node and compute nodes.

view this post on Zulip Anderson Banihirwe (May 06 2020 at 17:55):

It seems to happen on the login node and compute nodes.

This is likely due to some GLADE/filesystem issue

view this post on Zulip Anderson Banihirwe (May 06 2020 at 17:56):

Can you try running pip install with the --verbose option on to see if there is any useful information about the problem

pip install . --upgrade --verbose

view this post on Zulip Anderson Banihirwe (May 06 2020 at 17:59):

P.S.: Apparently the --verbose option is additive. Beware :grinning:

$ pip --help
  -v, --verbose               Give more output. Option is additive, and can be
                              used up to 3 times.

view this post on Zulip Riley Brady (May 06 2020 at 19:56):

@Anderson Banihirwe, the --verbose option doesn't give enlightening information. It hangs here for 30s-1min (sometimes longer):

$ pip install . --upgrade --verbose
Non-user install because site-packages writeable
Created temporary directory: /glade/scratch/rbrady/tmp/pip-ephem-wheel-cache-qljvs7i7
Created temporary directory: /glade/scratch/rbrady/tmp/pip-req-tracker-dzd7io56
Initialized build tracking at /glade/scratch/rbrady/tmp/pip-req-tracker-dzd7io56
Created build tracker: /glade/scratch/rbrady/tmp/pip-req-tracker-dzd7io56
Entered build tracker: /glade/scratch/rbrady/tmp/pip-req-tracker-dzd7io56
Created temporary directory: /glade/scratch/rbrady/tmp/pip-install-deditwzg
Processing /glade/work/rbrady/projects/carbonpathways
  Created temporary directory: /glade/scratch/rbrady/tmp/pip-req-build-7vwn7mxe

Then cranks through this:

Processing /glade/work/rbrady/projects/carbonpathways
  Created temporary directory: /glade/scratch/rbrady/tmp/pip-req-build-7vwn7mxe
  Added file:///glade/work/rbrady/projects/carbonpathways to build tracker '/glade/scratch/rbrady/tmp/pip-req-tracker-dzd7io56'
    Running setup.py (path:/glade/scratch/rbrady/tmp/pip-req-build-7vwn7mxe/setup.py) egg_info for package from file:///glade/work/rbrady/projects/carbonpathways
    Running command python setup.py egg_info
    running egg_info
    creating /glade/scratch/rbrady/tmp/pip-req-build-7vwn7mxe/pip-egg-info/carbonpathways.egg-info
    writing /glade/scratch/rbrady/tmp/pip-req-build-7vwn7mxe/pip-egg-info/carbonpathways.egg-info/PKG-INFO
    writing dependency_links to /glade/scratch/rbrady/tmp/pip-req-build-7vwn7mxe/pip-egg-info/carbonpathways.egg-info/dependency_links.txt
    writing top-level names to /glade/scratch/rbrady/tmp/pip-req-build-7vwn7mxe/pip-egg-info/carbonpathways.egg-info/top_level.txt
    writing manifest file '/glade/scratch/rbrady/tmp/pip-req-build-7vwn7mxe/pip-egg-info/carbonpathways.egg-info/SOURCES.txt'
    reading manifest file '/glade/scratch/rbrady/tmp/pip-req-build-7vwn7mxe/pip-egg-info/carbonpathways.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    writing manifest file '/glade/scratch/rbrady/tmp/pip-req-build-7vwn7mxe/pip-egg-info/carbonpathways.egg-info/SOURCES.txt'
  Source in /glade/scratch/rbrady/tmp/pip-req-build-7vwn7mxe has version 0.1.0, which satisfies requirement carbonpathways==0.1.0 from file:///glade/work/rbrady/projects/carbonpathways
  Removed carbonpathways==0.1.0 from file:///glade/work/rbrady/projects/carbonpathways from build tracker '/glade/scratch/rbrady/tmp/pip-req-tracker-dzd7io56'
Building wheels for collected packages: carbonpathways
  Created temporary directory: /glade/scratch/rbrady/tmp/pip-wheel-8jdqqqq1
  Building wheel for carbonpathways (setup.py) ...   Destination directory: /glade/scratch/rbrady/tmp/pip-wheel-8jdqqqq1
  Running command /glade/work/rbrady/miniconda3/envs/carbonpathways/bin/python3.8 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/glade/scratch/rbrady/tmp/pip-req-build-7vwn7mxe/setup.py'"'"'; __file__='"'"'/glade/scratch/rbrady/tmp/pip-req-build-7vwn7mxe/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /glade/scratch/rbrady/tmp/pip-wheel-8jdqqqq1
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib
  creating build/lib/carbonpathways
  copying carbonpathways/memory.py -> build/lib/carbonpathways
  copying carbonpathways/parallel.py -> build/lib/carbonpathways
  copying carbonpathways/__init__.py -> build/lib/carbonpathways
  copying carbonpathways/regions.py -> build/lib/carbonpathways
  copying carbonpathways/subset.py -> build/lib/carbonpathways
  copying carbonpathways/preprocess.py -> build/lib/carbonpathways
  creating build/lib/carbonpathways/visualization
  copying carbonpathways/visualization/visualize.py -> build/lib/carbonpathways/visualization
  copying carbonpathways/visualization/__init__.py -> build/lib/carbonpathways/visualization
  creating build/lib/carbonpathways/data
  copying carbonpathways/data/make_dataset.py -> build/lib/carbonpathways/data
  copying carbonpathways/data/__init__.py -> build/lib/carbonpathways/data
  running egg_info
  creating carbonpathways.egg-info
  writing carbonpathways.egg-info/PKG-INFO
  writing dependency_links to carbonpathways.egg-info/dependency_links.txt
  writing top-level names to carbonpathways.egg-info/top_level.txt
  writing manifest file 'carbonpathways.egg-info/SOURCES.txt'
  reading manifest file 'carbonpathways.egg-info/SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  writing manifest file 'carbonpathways.egg-info/SOURCES.txt'
  copying carbonpathways/particle_test_file.nc -> build/lib/carbonpathways
  installing to build/bdist.linux-x86_64/wheel
  running install
  running install_lib
  creating build/bdist.linux-x86_64
  creating build/bdist.linux-x86_64/wheel
  creating build/bdist.linux-x86_64/wheel/carbonpathways
  copying build/lib/carbonpathways/memory.py -> build/bdist.linux-x86_64/wheel/carbonpathways
  copying build/lib/carbonpathways/parallel.py -> build/bdist.linux-x86_64/wheel/carbonpathways
  copying build/lib/carbonpathways/__init__.py -> build/bdist.linux-x86_64/wheel/carbonpathways
  copying build/lib/carbonpathways/regions.py -> build/bdist.linux-x86_64/wheel/carbonpathways
  copying build/lib/carbonpathways/particle_test_file.nc -> build/bdist.linux-x86_64/wheel/carbonpathways
  copying build/lib/carbonpathways/subset.py -> build/bdist.linux-x86_64/wheel/carbonpathways
  creating build/bdist.linux-x86_64/wheel/carbonpathways/visualization
  copying build/lib/carbonpathways/visualization/visualize.py -> build/bdist.linux-x86_64/wheel/carbonpathways/visualization
  copying build/lib/carbonpathways/visualization/__init__.py -> build/bdist.linux-x86_64/wheel/carbonpathways/visualization
  creating build/bdist.linux-x86_64/wheel/carbonpathways/data
  copying build/lib/carbonpathways/data/make_dataset.py -> build/bdist.linux-x86_64/wheel/carbonpathways/data
  copying build/lib/carbonpathways/data/__init__.py -> build/bdist.linux-x86_64/wheel/carbonpathways/data
  copying build/lib/carbonpathways/preprocess.py -> build/bdist.linux-x86_64/wheel/carbonpathways
  running install_egg_info
  Copying carbonpathways.egg-info to build/bdist.linux-x86_64/wheel/carbonpathways-0.1.0-py3.8.egg-info
  running install_scripts
  adding license file "LICENSE" (matched pattern "LICEN[CS]E*")
  creating build/bdist.linux-x86_64/wheel/carbonpathways-0.1.0.dist-info/WHEEL
  creating '/glade/scratch/rbrady/tmp/pip-wheel-8jdqqqq1/carbonpathways-0.1.0-py3-none-any.whl' and adding 'build/bdist.linux-x86_64/wheel' to it
  adding 'carbonpathways/__init__.py'
  adding 'carbonpathways/memory.py'
  adding 'carbonpathways/parallel.py'
  adding 'carbonpathways/particle_test_file.nc'
  adding 'carbonpathways/preprocess.py'
  adding 'carbonpathways/regions.py'
  adding 'carbonpathways/subset.py'
  adding 'carbonpathways/data/__init__.py'
  adding 'carbonpathways/data/make_dataset.py'
  adding 'carbonpathways/visualization/__init__.py'
  adding 'carbonpathways/visualization/visualize.py'
  adding 'carbonpathways-0.1.0.dist-info/LICENSE'
  adding 'carbonpathways-0.1.0.dist-info/METADATA'
  adding 'carbonpathways-0.1.0.dist-info/WHEEL'
  adding 'carbonpathways-0.1.0.dist-info/top_level.txt'
  adding 'carbonpathways-0.1.0.dist-info/RECORD'
  removing build/bdist.linux-x86_64/wheel
done
  Created wheel for carbonpathways: filename=carbonpathways-0.1.0-py3-none-any.whl size=52801 sha256=0e3e90d90d10e3861cdd426b74b357a77785d693cbc22aae06885f3fc32983b0
  Stored in directory: /glade/scratch/rbrady/tmp/pip-ephem-wheel-cache-qljvs7i7/wheels/64/2a/a6/9dc322f41f7002c714ffef0f74000ba1384978c0591cfd84be
Successfully built carbonpathways
Installing collected packages: carbonpathways
  Attempting uninstall: carbonpathways
    Found existing installation: carbonpathways 0.1.0
    Uninstalling carbonpathways-0.1.0:
      Created temporary directory: /glade/work/rbrady/miniconda3/envs/carbonpathways/lib/python3.8/site-packages/~arbonpathways-0.1.0.dist-info
      Removing file or directory /glade/work/rbrady/miniconda3/envs/carbonpathways/lib/python3.8/site-packages/carbonpathways-0.1.0.dist-info/
      Created temporary directory: /glade/work/rbrady/miniconda3/envs/carbonpathways/lib/python3.8/site-packages/~arbonpathways
      Removing file or directory /glade/work/rbrady/miniconda3/envs/carbonpathways/lib/python3.8/site-packages/carbonpathways/
      Successfully uninstalled carbonpathways-0.1.0
  Created temporary directory: /glade/scratch/rbrady/tmp/pip-unpacked-wheel-7qxfk_no

Successfully installed carbonpathways-0.1.0
Cleaning up...
  Removing source in /glade/scratch/rbrady/tmp/pip-req-build-7vwn7mxe
Removed build tracker: '/glade/scratch/rbrady/tmp/pip-req-tracker-dzd7io56'

This is about as lightweight of a package as you can get. I notice that some days it will install in order seconds. Other days, minutes. Also, what does additive mean in this case? Just for my own knowledge.

view this post on Zulip Anderson Banihirwe (May 06 2020 at 20:07):

view this post on Zulip Anderson Banihirwe (May 06 2020 at 20:08):

view this post on Zulip Anderson Banihirwe (May 06 2020 at 20:09):

view this post on Zulip Brian Bonnlander (May 06 2020 at 20:12):

Additive might mean that --verbose ---verbose will give you even more information.

view this post on Zulip Anderson Banihirwe (May 06 2020 at 20:13):

Additive might mean that --verbose ---verbose will give you even more information.

Yep.. and the short version looks like: pip install -vvv ....

view this post on Zulip Riley Brady (May 06 2020 at 20:16):

  1. Version is pip 20.0.2
  2. Which directory? Scratch (where --verbose implies temporary things are being installed) is at 9% full. Work, where pip/conda installs to is 57% full. The interior carbonpathways folder with python code is 252 kb. Although the main folder is 66GB since I have a ./data folder with some post-processed output for now. That's not included whatsoever in the setup.py file so I figured it ignored that kind of stuff.
  3. It takes 1 min 6s to install.
  4. I guess I'm not changing the version name, but I am adding code and modules. So I figured --upgrade overwrote the current installation. As opposed to doing uninstall then reinstall.

Also this is independent of cheyenne or casper node.

view this post on Zulip Riley Brady (May 06 2020 at 20:17):

I think even with that large /data folder I have some days where this sort of install works in a second or two.

view this post on Zulip Anderson Banihirwe (May 06 2020 at 20:20):

I guess I'm not changing the version name, but I am adding code and modules. So I figured --upgrade overwrote the current installation. As opposed to doing uninstall then reinstall.

-e, or --editable option might be a better alternative to --upgrade i.e. pip install . -e. When using -e option, pip will just link the package to the original location, meaning any changes to the original package would reflect directly in your environment.

view this post on Zulip Riley Brady (May 06 2020 at 20:24):

Hm. I need to read up on that. That installs instantaneously but isn't working with autoreload. In the pip install . --upgrade case, I don't have to restart my notebook. If I run pip install -e . with and without --upgrade I don't get updates to functions in my notebook.

view this post on Zulip Anderson Banihirwe (May 06 2020 at 20:26):

It takes 1 min 6s to install.

I am assuming this is the real (wall clock) time. When you get a chance, can you post the full output of time pip install . --upgrade? When I first asked for this, I was going for the user, and systimes as well.

view this post on Zulip Anderson Banihirwe (May 06 2020 at 20:29):

Hm. I need to read up on that. That installs instantaneously but isn't working with autoreload. In the pip install . --upgrade case, I don't have to restart my notebook. If I run pip install -e . with and without --upgrade I don't get updates to functions in my notebook.

Ooooh I see... I didn't know you were using the autoreload magic as well.. %autoreload has some caveats...

view this post on Zulip Riley Brady (May 06 2020 at 20:34):

real    1m6.161s
user    0m1.715s
sys 1m2.335s

view this post on Zulip Riley Brady (May 06 2020 at 20:35):

I'm using autoreload since I'm working with dask_jobqueue and don't want to have to kill and restart all my workers every time I update my local package.

view this post on Zulip Anderson Banihirwe (May 06 2020 at 20:55):

As I suspected, -e is way faster:

$ time pip install -e .
Obtaining file:///glade/scratch/abanihi/carbonpathways
Installing collected packages: carbonpathways
  Running setup.py develop for carbonpathways
Successfully installed carbonpathways

real    0m4.519s
user    0m1.941s
sys      0m0.639s

My takeaway from this is that you either have to trade pip install . --upgrade speed for the flexibility provided by %autoreload magic or go with pip install -e . at the expense of having to rerun your notebook from scratch :frown:

view this post on Zulip Riley Brady (May 06 2020 at 21:01):

Thanks for the input! Wasn't sure if I was missing something. I'll ping you in this thread if a day comes up soon where the timings are drastically different. If that doesn't happen for awhile maybe I'll try moving my data folder out of there to see if setup is somehow including it. Although I don't think that's the case.

view this post on Zulip Anderson Banihirwe (May 06 2020 at 21:02):

By the way, the data directory may be the culprit here....

view this post on Zulip Anderson Banihirwe (May 06 2020 at 21:02):

During pip install . --upgrade, you will notice that pip creates a temporary directory

view this post on Zulip Anderson Banihirwe (May 06 2020 at 21:03):

It then copies everything from carbonpathways main directory into this temporary directory

view this post on Zulip Riley Brady (May 06 2020 at 21:03):

Hm I'll move that tomorrow and see what happens. I'm following more of a cookiecutter repo format (https://github.com/bradyrx/cookiecutter-climate-science) to keep everything nice and together . Data isn't going up to git of course but is nice to have consolidated there rather than in scratch. So maybe I should forego that for speed.

view this post on Zulip Anderson Banihirwe (May 06 2020 at 21:05):

Removing the data directory reduces the wall clock time to

real    0m5.541s
user    0m1.894s
sys 0m0.863s

view this post on Zulip Anderson Banihirwe (May 06 2020 at 21:06):

That's a huge improvement compared to the original

real    1m6.161s
user    0m1.715s
sys 1m2.335s

view this post on Zulip Riley Brady (May 06 2020 at 21:06):

Hm, okay. That's it then. i'm wondering if there's some flag or way to have pip ignore certain sub-directorries. Because it is convenient to keep my post-proc data there for the project.

view this post on Zulip Anderson Banihirwe (May 06 2020 at 21:06):

So, you may want to move datasomewhere else

view this post on Zulip Michael Levy (May 06 2020 at 21:07):

Data isn't going up to git of course but is nice to have consolidated there rather than in scratch

could you keep data elsewhere on /glade/work and then softlink it in your git clone?

view this post on Zulip Riley Brady (May 06 2020 at 21:08):

That's a good idea @Michael Levy . I'll just do a soft link into my repo. Thanks! Well, I'll check that pip doesn't try to copy that linked directory.

view this post on Zulip Michael Levy (May 06 2020 at 21:09):

Well, I'll check that pip doesn't try to copy that linked directory.

yeah, I don't know pip well enough to know what it'll do, but :fingers_crossed:

view this post on Zulip Anderson Banihirwe (May 06 2020 at 21:10):

Another solution is to edit your MANIFEST.in file and add the following line

prune data*

I haven't tested this yet though

view this post on Zulip Anderson Banihirwe (May 06 2020 at 21:14):

it does not work

view this post on Zulip Anderson Banihirwe (May 06 2020 at 21:26):

Give python setup.py install a try

view this post on Zulip Anderson Banihirwe (May 06 2020 at 21:27):

You may not need to move your data directory after all

view this post on Zulip Riley Brady (May 06 2020 at 21:28):

I was thinking about that instead of pip! Will give that a try tomorrow and let you know.

view this post on Zulip Anderson Banihirwe (May 06 2020 at 21:36):

I ended up going down a rabbit hole, and I think it paid off... :grinning:

Good News:

If you upgrade to pip>=20.1, you should be good... It appears that this issue (https://github.com/pypa/pip/issues/2195) was addressed in https://github.com/pypa/pip/pull/7882

view this post on Zulip Anderson Banihirwe (May 06 2020 at 21:37):

I tested it, and here's what I got

view this post on Zulip Anderson Banihirwe (May 06 2020 at 21:37):

real    0m4.362s
user    0m1.857s
sys 0m0.553s

view this post on Zulip Anderson Banihirwe (May 06 2020 at 21:38):

That's it for me for today :grinning: .... I won't spam this stream/topic again at least for today...

view this post on Zulip Riley Brady (May 07 2020 at 14:00):

Works great, thanks so much @Anderson Banihirwe. The old update-the-package trick. Well, if you just do standard upgrade it only seems to go to 20.0.2 or so. So one does have to force >=20.1 in the conda environment.

real    0m1.833s
user    0m1.387s
sys 0m0.388s

Last updated: Jan 30 2022 at 12:01 UTC