Stream: jupyter
Topic: slow pip install
Riley Brady (May 06 2020 at 17:12):
This is more of a general question for Cheyenne/Casper rather than jupyter specific. Does anyone have a sense of what causes such sluggish pip install
times sometimes? I use conda
for my environment, but when developing e.g. project-specific packages of code, I need to run pip install . --upgrade
on the main package directory.
Some days this seems to be instantaneous. Other days it will take minutes to run. Any thoughts on this in general or how to speed it up? Today is a day that it's taking minutes. It seems to happen on the login node and compute nodes.
Anderson Banihirwe (May 06 2020 at 17:55):
It seems to happen on the login node and compute nodes.
This is likely due to some GLADE/filesystem issue
Anderson Banihirwe (May 06 2020 at 17:56):
Can you try running pip install
with the --verbose
option on to see if there is any useful information about the problem
pip install . --upgrade --verbose
Anderson Banihirwe (May 06 2020 at 17:59):
P.S.: Apparently the --verbose
option is additive. Beware :grinning:
$ pip --help
-v, --verbose Give more output. Option is additive, and can be
used up to 3 times.
Riley Brady (May 06 2020 at 19:56):
@Anderson Banihirwe, the --verbose
option doesn't give enlightening information. It hangs here for 30s-1min (sometimes longer):
$ pip install . --upgrade --verbose Non-user install because site-packages writeable Created temporary directory: /glade/scratch/rbrady/tmp/pip-ephem-wheel-cache-qljvs7i7 Created temporary directory: /glade/scratch/rbrady/tmp/pip-req-tracker-dzd7io56 Initialized build tracking at /glade/scratch/rbrady/tmp/pip-req-tracker-dzd7io56 Created build tracker: /glade/scratch/rbrady/tmp/pip-req-tracker-dzd7io56 Entered build tracker: /glade/scratch/rbrady/tmp/pip-req-tracker-dzd7io56 Created temporary directory: /glade/scratch/rbrady/tmp/pip-install-deditwzg Processing /glade/work/rbrady/projects/carbonpathways Created temporary directory: /glade/scratch/rbrady/tmp/pip-req-build-7vwn7mxe
Then cranks through this:
Processing /glade/work/rbrady/projects/carbonpathways Created temporary directory: /glade/scratch/rbrady/tmp/pip-req-build-7vwn7mxe Added file:///glade/work/rbrady/projects/carbonpathways to build tracker '/glade/scratch/rbrady/tmp/pip-req-tracker-dzd7io56' Running setup.py (path:/glade/scratch/rbrady/tmp/pip-req-build-7vwn7mxe/setup.py) egg_info for package from file:///glade/work/rbrady/projects/carbonpathways Running command python setup.py egg_info running egg_info creating /glade/scratch/rbrady/tmp/pip-req-build-7vwn7mxe/pip-egg-info/carbonpathways.egg-info writing /glade/scratch/rbrady/tmp/pip-req-build-7vwn7mxe/pip-egg-info/carbonpathways.egg-info/PKG-INFO writing dependency_links to /glade/scratch/rbrady/tmp/pip-req-build-7vwn7mxe/pip-egg-info/carbonpathways.egg-info/dependency_links.txt writing top-level names to /glade/scratch/rbrady/tmp/pip-req-build-7vwn7mxe/pip-egg-info/carbonpathways.egg-info/top_level.txt writing manifest file '/glade/scratch/rbrady/tmp/pip-req-build-7vwn7mxe/pip-egg-info/carbonpathways.egg-info/SOURCES.txt' reading manifest file '/glade/scratch/rbrady/tmp/pip-req-build-7vwn7mxe/pip-egg-info/carbonpathways.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' writing manifest file '/glade/scratch/rbrady/tmp/pip-req-build-7vwn7mxe/pip-egg-info/carbonpathways.egg-info/SOURCES.txt' Source in /glade/scratch/rbrady/tmp/pip-req-build-7vwn7mxe has version 0.1.0, which satisfies requirement carbonpathways==0.1.0 from file:///glade/work/rbrady/projects/carbonpathways Removed carbonpathways==0.1.0 from file:///glade/work/rbrady/projects/carbonpathways from build tracker '/glade/scratch/rbrady/tmp/pip-req-tracker-dzd7io56' Building wheels for collected packages: carbonpathways Created temporary directory: /glade/scratch/rbrady/tmp/pip-wheel-8jdqqqq1 Building wheel for carbonpathways (setup.py) ... Destination directory: /glade/scratch/rbrady/tmp/pip-wheel-8jdqqqq1 Running command /glade/work/rbrady/miniconda3/envs/carbonpathways/bin/python3.8 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/glade/scratch/rbrady/tmp/pip-req-build-7vwn7mxe/setup.py'"'"'; __file__='"'"'/glade/scratch/rbrady/tmp/pip-req-build-7vwn7mxe/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /glade/scratch/rbrady/tmp/pip-wheel-8jdqqqq1 running bdist_wheel running build running build_py creating build creating build/lib creating build/lib/carbonpathways copying carbonpathways/memory.py -> build/lib/carbonpathways copying carbonpathways/parallel.py -> build/lib/carbonpathways copying carbonpathways/__init__.py -> build/lib/carbonpathways copying carbonpathways/regions.py -> build/lib/carbonpathways copying carbonpathways/subset.py -> build/lib/carbonpathways copying carbonpathways/preprocess.py -> build/lib/carbonpathways creating build/lib/carbonpathways/visualization copying carbonpathways/visualization/visualize.py -> build/lib/carbonpathways/visualization copying carbonpathways/visualization/__init__.py -> build/lib/carbonpathways/visualization creating build/lib/carbonpathways/data copying carbonpathways/data/make_dataset.py -> build/lib/carbonpathways/data copying carbonpathways/data/__init__.py -> build/lib/carbonpathways/data running egg_info creating carbonpathways.egg-info writing carbonpathways.egg-info/PKG-INFO writing dependency_links to carbonpathways.egg-info/dependency_links.txt writing top-level names to carbonpathways.egg-info/top_level.txt writing manifest file 'carbonpathways.egg-info/SOURCES.txt' reading manifest file 'carbonpathways.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' writing manifest file 'carbonpathways.egg-info/SOURCES.txt' copying carbonpathways/particle_test_file.nc -> build/lib/carbonpathways installing to build/bdist.linux-x86_64/wheel running install running install_lib creating build/bdist.linux-x86_64 creating build/bdist.linux-x86_64/wheel creating build/bdist.linux-x86_64/wheel/carbonpathways copying build/lib/carbonpathways/memory.py -> build/bdist.linux-x86_64/wheel/carbonpathways copying build/lib/carbonpathways/parallel.py -> build/bdist.linux-x86_64/wheel/carbonpathways copying build/lib/carbonpathways/__init__.py -> build/bdist.linux-x86_64/wheel/carbonpathways copying build/lib/carbonpathways/regions.py -> build/bdist.linux-x86_64/wheel/carbonpathways copying build/lib/carbonpathways/particle_test_file.nc -> build/bdist.linux-x86_64/wheel/carbonpathways copying build/lib/carbonpathways/subset.py -> build/bdist.linux-x86_64/wheel/carbonpathways creating build/bdist.linux-x86_64/wheel/carbonpathways/visualization copying build/lib/carbonpathways/visualization/visualize.py -> build/bdist.linux-x86_64/wheel/carbonpathways/visualization copying build/lib/carbonpathways/visualization/__init__.py -> build/bdist.linux-x86_64/wheel/carbonpathways/visualization creating build/bdist.linux-x86_64/wheel/carbonpathways/data copying build/lib/carbonpathways/data/make_dataset.py -> build/bdist.linux-x86_64/wheel/carbonpathways/data copying build/lib/carbonpathways/data/__init__.py -> build/bdist.linux-x86_64/wheel/carbonpathways/data copying build/lib/carbonpathways/preprocess.py -> build/bdist.linux-x86_64/wheel/carbonpathways running install_egg_info Copying carbonpathways.egg-info to build/bdist.linux-x86_64/wheel/carbonpathways-0.1.0-py3.8.egg-info running install_scripts adding license file "LICENSE" (matched pattern "LICEN[CS]E*") creating build/bdist.linux-x86_64/wheel/carbonpathways-0.1.0.dist-info/WHEEL creating '/glade/scratch/rbrady/tmp/pip-wheel-8jdqqqq1/carbonpathways-0.1.0-py3-none-any.whl' and adding 'build/bdist.linux-x86_64/wheel' to it adding 'carbonpathways/__init__.py' adding 'carbonpathways/memory.py' adding 'carbonpathways/parallel.py' adding 'carbonpathways/particle_test_file.nc' adding 'carbonpathways/preprocess.py' adding 'carbonpathways/regions.py' adding 'carbonpathways/subset.py' adding 'carbonpathways/data/__init__.py' adding 'carbonpathways/data/make_dataset.py' adding 'carbonpathways/visualization/__init__.py' adding 'carbonpathways/visualization/visualize.py' adding 'carbonpathways-0.1.0.dist-info/LICENSE' adding 'carbonpathways-0.1.0.dist-info/METADATA' adding 'carbonpathways-0.1.0.dist-info/WHEEL' adding 'carbonpathways-0.1.0.dist-info/top_level.txt' adding 'carbonpathways-0.1.0.dist-info/RECORD' removing build/bdist.linux-x86_64/wheel done Created wheel for carbonpathways: filename=carbonpathways-0.1.0-py3-none-any.whl size=52801 sha256=0e3e90d90d10e3861cdd426b74b357a77785d693cbc22aae06885f3fc32983b0 Stored in directory: /glade/scratch/rbrady/tmp/pip-ephem-wheel-cache-qljvs7i7/wheels/64/2a/a6/9dc322f41f7002c714ffef0f74000ba1384978c0591cfd84be Successfully built carbonpathways Installing collected packages: carbonpathways Attempting uninstall: carbonpathways Found existing installation: carbonpathways 0.1.0 Uninstalling carbonpathways-0.1.0: Created temporary directory: /glade/work/rbrady/miniconda3/envs/carbonpathways/lib/python3.8/site-packages/~arbonpathways-0.1.0.dist-info Removing file or directory /glade/work/rbrady/miniconda3/envs/carbonpathways/lib/python3.8/site-packages/carbonpathways-0.1.0.dist-info/ Created temporary directory: /glade/work/rbrady/miniconda3/envs/carbonpathways/lib/python3.8/site-packages/~arbonpathways Removing file or directory /glade/work/rbrady/miniconda3/envs/carbonpathways/lib/python3.8/site-packages/carbonpathways/ Successfully uninstalled carbonpathways-0.1.0 Created temporary directory: /glade/scratch/rbrady/tmp/pip-unpacked-wheel-7qxfk_no Successfully installed carbonpathways-0.1.0 Cleaning up... Removing source in /glade/scratch/rbrady/tmp/pip-req-build-7vwn7mxe Removed build tracker: '/glade/scratch/rbrady/tmp/pip-req-tracker-dzd7io56'
This is about as lightweight of a package as you can get. I notice that some days it will install in order seconds. Other days, minutes. Also, what does additive mean in this case? Just for my own knowledge.
Anderson Banihirwe (May 06 2020 at 20:07):
- What version of
pip
are you running? - What is the size of your directory?
Anderson Banihirwe (May 06 2020 at 20:08):
- What is the output of
time pip install . --upgrade
?
Anderson Banihirwe (May 06 2020 at 20:09):
- Since you are installing the package from a local directory, do you need the
--upgrade
flag?
Brian Bonnlander (May 06 2020 at 20:12):
Additive might mean that --verbose ---verbose
will give you even more information.
Anderson Banihirwe (May 06 2020 at 20:13):
Additive might mean that
--verbose ---verbose
will give you even more information.
Yep.. and the short version looks like: pip install -vvv ....
Riley Brady (May 06 2020 at 20:16):
- Version is pip 20.0.2
- Which directory? Scratch (where
--verbose
implies temporary things are being installed) is at 9% full. Work, where pip/conda installs to is 57% full. The interiorcarbonpathways
folder with python code is 252 kb. Although the main folder is 66GB since I have a./data
folder with some post-processed output for now. That's not included whatsoever in thesetup.py
file so I figured it ignored that kind of stuff. - It takes 1 min 6s to install.
- I guess I'm not changing the version name, but I am adding code and modules. So I figured
--upgrade
overwrote the current installation. As opposed to doing uninstall then reinstall.
Also this is independent of cheyenne or casper node.
Riley Brady (May 06 2020 at 20:17):
I think even with that large /data
folder I have some days where this sort of install works in a second or two.
Anderson Banihirwe (May 06 2020 at 20:20):
I guess I'm not changing the version name, but I am adding code and modules. So I figured --upgrade overwrote the current installation. As opposed to doing uninstall then reinstall.
-e
, or --editable
option might be a better alternative to --upgrade
i.e. pip install . -e
. When using -e
option, pip will just link the package to the original location, meaning any changes to the original package would reflect directly in your environment.
Riley Brady (May 06 2020 at 20:24):
Hm. I need to read up on that. That installs instantaneously but isn't working with autoreload
. In the pip install . --upgrade
case, I don't have to restart my notebook. If I run pip install -e .
with and without --upgrade
I don't get updates to functions in my notebook.
Anderson Banihirwe (May 06 2020 at 20:26):
It takes 1 min 6s to install.
I am assuming this is the real (wall clock) time. When you get a chance, can you post the full output of time pip install . --upgrade
? When I first asked for this, I was going for the user
, and sys
times as well.
Anderson Banihirwe (May 06 2020 at 20:29):
Hm. I need to read up on that. That installs instantaneously but isn't working with
autoreload
. In thepip install . --upgrade
case, I don't have to restart my notebook. If I runpip install -e .
with and without--upgrade
I don't get updates to functions in my notebook.
Ooooh I see... I didn't know you were using the autoreload
magic as well.. %autoreload
has some caveats...
Riley Brady (May 06 2020 at 20:34):
real 1m6.161s user 0m1.715s sys 1m2.335s
Riley Brady (May 06 2020 at 20:35):
I'm using autoreload
since I'm working with dask_jobqueue
and don't want to have to kill and restart all my workers every time I update my local package.
Anderson Banihirwe (May 06 2020 at 20:55):
As I suspected, -e
is way faster:
$ time pip install -e . Obtaining file:///glade/scratch/abanihi/carbonpathways Installing collected packages: carbonpathways Running setup.py develop for carbonpathways Successfully installed carbonpathways real 0m4.519s user 0m1.941s sys 0m0.639s
My takeaway from this is that you either have to trade pip install . --upgrade
speed for the flexibility provided by %autoreload
magic or go with pip install -e .
at the expense of having to rerun your notebook from scratch :frown:
Riley Brady (May 06 2020 at 21:01):
Thanks for the input! Wasn't sure if I was missing something. I'll ping you in this thread if a day comes up soon where the timings are drastically different. If that doesn't happen for awhile maybe I'll try moving my data
folder out of there to see if setup
is somehow including it. Although I don't think that's the case.
Anderson Banihirwe (May 06 2020 at 21:02):
By the way, the data
directory may be the culprit here....
Anderson Banihirwe (May 06 2020 at 21:02):
During pip install . --upgrade
, you will notice that pip creates a temporary directory
Anderson Banihirwe (May 06 2020 at 21:03):
It then copies everything from carbonpathways
main directory into this temporary directory
Riley Brady (May 06 2020 at 21:03):
Hm I'll move that tomorrow and see what happens. I'm following more of a cookiecutter repo format (https://github.com/bradyrx/cookiecutter-climate-science) to keep everything nice and together . Data isn't going up to git of course but is nice to have consolidated there rather than in scratch. So maybe I should forego that for speed.
Anderson Banihirwe (May 06 2020 at 21:05):
Removing the data
directory reduces the wall clock time to
real 0m5.541s user 0m1.894s sys 0m0.863s
Anderson Banihirwe (May 06 2020 at 21:06):
That's a huge improvement compared to the original
real 1m6.161s user 0m1.715s sys 1m2.335s
Riley Brady (May 06 2020 at 21:06):
Hm, okay. That's it then. i'm wondering if there's some flag or way to have pip ignore certain sub-directorries. Because it is convenient to keep my post-proc data there for the project.
Anderson Banihirwe (May 06 2020 at 21:06):
So, you may want to move data
somewhere else
Michael Levy (May 06 2020 at 21:07):
Data isn't going up to git of course but is nice to have consolidated there rather than in scratch
could you keep data elsewhere on /glade/work
and then softlink it in your git clone?
Riley Brady (May 06 2020 at 21:08):
That's a good idea @Michael Levy . I'll just do a soft link into my repo. Thanks! Well, I'll check that pip doesn't try to copy that linked directory.
Michael Levy (May 06 2020 at 21:09):
Well, I'll check that pip doesn't try to copy that linked directory.
yeah, I don't know pip
well enough to know what it'll do, but :fingers_crossed:
Anderson Banihirwe (May 06 2020 at 21:10):
Another solution is to edit your MANIFEST.in
file and add the following line
prune data*
I haven't tested this yet though
Anderson Banihirwe (May 06 2020 at 21:14):
it does not work
Anderson Banihirwe (May 06 2020 at 21:26):
Give python setup.py install
a try
Anderson Banihirwe (May 06 2020 at 21:27):
You may not need to move your data
directory after all
Riley Brady (May 06 2020 at 21:28):
I was thinking about that instead of pip! Will give that a try tomorrow and let you know.
Anderson Banihirwe (May 06 2020 at 21:36):
I ended up going down a rabbit hole, and I think it paid off... :grinning:
Good News:
If you upgrade to pip>=20.1
, you should be good... It appears that this issue (https://github.com/pypa/pip/issues/2195) was addressed in https://github.com/pypa/pip/pull/7882
Anderson Banihirwe (May 06 2020 at 21:37):
I tested it, and here's what I got
Anderson Banihirwe (May 06 2020 at 21:37):
real 0m4.362s user 0m1.857s sys 0m0.553s
Anderson Banihirwe (May 06 2020 at 21:38):
That's it for me for today :grinning: .... I won't spam this stream/topic again at least for today...
Riley Brady (May 07 2020 at 14:00):
Works great, thanks so much @Anderson Banihirwe. The old update-the-package trick. Well, if you just do standard upgrade it only seems to go to 20.0.2 or so. So one does have to force >=20.1
in the conda environment.
real 0m1.833s user 0m1.387s sys 0m0.388s
Last updated: Jan 30 2022 at 12:01 UTC