Design Choices
Many things can be accomplished in multiple ways using Spack, but from experience some methods are fraught with issues for the NCAR HPC module tree. As a reminder, here are basic design objectives we've tried to adhere to in order to match user expectations:
- Software stacks evolve over time, with new versions of packages being added as they become available.
- Much of the software stack should be roughly the same on both the HPC system and the analysis system.
Reuse vs Reproducibility
One of the biggest choices in Spack behavior is whether to encourage the
concretizer to reuse packages or to instead prefer "fresh" solutions. When
reuse: true is specified, the concretizer will be much more likely to use
existing builds for package dependencies - basically, even if the version of the
package or the variants aren't "ideal", the concretizer will typically use the
installed dependency anyway as long as it doesn't violate a requirement. If
reuse: false is specified, the concretizer will always create an "optimal"
solution as it sees things.
Both approaches bring major advantages and severe disadvantages, and I have gone back and forth on which is preferable.
reuse: true
This is currently the preferred mode for ncar-spack!
The Good
- Significantly reduces package duplication with unnecessary additional versions and variants.
- Since v1.0, this mode is pretty much required if you want to mix a "core" GCC and a newer GCC or vendor compiler in the same spec.
The Bad
- The package solution depends on the order you have installed packages before it. If a dependency exists already, it may look different than if Spack needs to figure out that dependency from scratch.
- Because of the above, performing concurrent package installs is not advisable as the solution will be non-deterministic.
reuse: false
Note
This mode does not mean that existing packages cannot be "reused" as dependencies - but that will only happen if the existing install exactly matches what the concretizer would want to use anyway. Basically, this is something analogous to a "strict matching" mode.
The Good
- The solution is repeatable as long as you either keep the package repository unchanged or track which commit of the package repository was available when the package was installed and be consistent.
- Concretizer behavior is much less bewildering.
The Bad
- In Spack v1.0+, compiler mixing becomes very difficult to impossible when
using this mode. While vendor compilers and GCC can still be mixed via
requirements, different GCC versions cannot be mixed due to
gcc-runtimedifferences. Maybe this will change again in the future... - If any change is made to a package recipe (
package.py), the hash of the concretized solution will change even if the version and variants are the same. So basically, adding new versions to a package via upstream fetching breaks the continuity of the stack, and in practice cannot really be done.
These negatives are fatal for our design, and so reuse: false is not practical
to use. Any updates to the package repository would really require a whole new
deployment if we used this mode.
Reducing Package Duplication
Certain packages do not need to be installed uniquely on both systems - most
notably vendor software. Many of these packages are indeed quite bloated, so
having multiple installs is wasteful of space as well (e.g.,
intel-oneapi-compilers and nvhpc).
To avoid duplication, we use a third cluster called common, which contains
these packages. The packages are then exposed to the production clusters (i.e.
derecho and casper) as externals, which are defined with the help of the
add_constraints script and the constraints.cfg file.
In theory, these packages could be installed into one of the cluster deployments and then exposed to the other cluster, removing the need for common, but this is not advisable for reasons described in the next section.
Historical Information
The original reason for the common environment was to constrain how the
concretizer would build packages by extensive use of externals. In old
versions of Spack, the require: specifier was not available. By putting
commonly-used dependencies into common, this mostly prevented Spack from
wildly choosing to build new versions and use undesired compilers.
Fortunately, this problem can be mitigated in cleaner ways now.
Externals vs Upstream Packages
In theory, the common environment could be exposed in one of two ways to cluster deployments:
- Add packages from common to clusters as externals
- Add the entire common deployment to each cluster as an upstream (or chained) installation of Spack
The second option has many advantages, but unfortunately is not workable at
present because Spack considers sles and opensuse to be distinct operating
systems, so if used as an upstream, the packages in common will only be used
on the system with the matching OS.
Note
The Spack developers claim they want to eliminate the OS concept entirely
from specs. If this happens, or if os_compatible can be made to work, the
upstream approach would be recommended.
The Downside of Externals
The major downside of externals is that they don't have any dependency
information. As a result, if you try to use source-built packages from another
Spack installation as externals in your deployment, you may get build errors
since Spack does not know what dependencies it needs to link in at compile time
for the external you are using. This can affect user usage as well as module
depends_on settings can be missing.
This is why we focus on vendor software for use as externals from common.
The one exception currently is gcc. As of v1.1, Spack's compiler support is
still in flux and if you install a compiler into the environment, it can be hard
to track and customize via YAML how the compiler is defined. Using externals to
bring in compilers from the common deployment helps constrain behavior.
Non-buildable Externals
When you add an external, you can specify whether or not Spack can also build a
non-external version of the package using buildable: [true/false]. Most of the
time, we prefer non-buildable externals as this has the advantageous property of
constraining the concretizer behavior. Otherwise, it may decide to ignore your
external and reinstall a package, which is time consuming and annoying if the
package is something like nvhpc.
User Packages in the View
On prior systems, we had HSG install many low-level packages into the image
directly using RPMs. With Derecho, HSG wanted to slim down the image and so many
fundamental packages (e.g., tmux, imagemagick, parallel) needed to be
installed with Spack by CSG instead.
These packages could all be modules, but it was decided instead to incorporate
them into a single directory using Spack's environment view capability, which
creates links (or if desired copies) of all requested package libraries,
binaries, man pages, etc in a single prefix. Essentially, we use it to mimic
what would traditionally go in /usr/local via RPM installs. The view is then
added to to the user environment via ncarenv, so it should be always available
for the majority of users.
The reasons for this choice vs using modules exclusively is as follows:
- A large module list is visually intimidating
- It imposes a burden on users to know what they might want, whereas with the view we can curate the "basic" experience
- Since many packages depend on libraries like
zlib, thedepends_onconflicts in module loads could be severe - A simpler stack improves Lmod performance