CISL Cloud Pilot#

This is a Pilot project to determine long-term feasibility.


Warning

This website is in active development and is updated frequently. Also, this project is pre-production and user data stored on our services is not guaranteed. Please utilize code repositories such as GitHub and production storage systems such as GLADE, Stratus, or a public cloud option. This ensures your work is backed up while making it easier to reproduce it quickly in different environments.

Welcome to the NSF NCAR | CISL Cloud Pilot documentation page. Below is a collection of useful information and links to help utilize resources for interactive scientific analysis.

About this Project#

CISL has deployed a pilot on-premise prototype cloud environment for computing and storage.

We partnered with 2i2c to deploy a JupyterHub instance in AWS as well. We leveraged the configuration of that instance to help get the on-prem JupyterHub built.

What is an on-premise cloud?#

NSF NCAR | CISL runs Compute, Storage & Network hardware in robust Data Centers at multiple organizational facilities. An on-premise cloud offers users the ability to utilize those highly available, organizationally supported, compute resources for approved use cases. This includes access to routable network space and UCAR Domain Name Systems (DNS). Security standards set by the organization are implemented and controlled by administrators to make sure internal policies are being adhered to. These resources would be provided to supplement computing needs that aren’t fulfilled by the HPC offering, public cloud, or what is available to you locally.

Resources#

The following resources were deployed based on interactions with users in the scientific community and their requested use cases. A full outline of the high level use cases can be found further down this page.

GitHub#

An Infrastructure as Code (IaC) approach is utilized in conjunction with Git version control to maintain a consistent and reproducible environment. This code is stored publicly in GitHub which enables team collaboration while maintaining version controlled and reusable configurations. Sensitive information is encrypted with SOPS and age giving administrators the ability to maintain security while providing all the information required to reproduce deployments. GitHub Actions is utilized to build container images on self-hosted runners deployed to the on-prem cloud and can integrate changes to configuration files used in Continuous Delivery (CD).

Kubernetes (K8s)#

Note

This page does not cover what Kubernetes is. For more information, see Kubernetes.

Kubernetes, often referred to as K8s with the 8 simply standing for the number of letters being replaced, is the industry standard for container orchestration. CISL operates a K8s cluster built on top of Linux running on Virtual Machines and bare metal. The cluster is architected to include shared resources that enable quick and secure ways to expose applications while providing a variety of persistent storage options. This is all utilized to host applications such as Rancher, Harbor, Argo CD, JupyterHub, Binder, JupyterBook, and custom built web applications for data science applications. More details about these applications can be found below or via the different pages included in this documentation.

The cluster is configured with an ingress controller, an API to create DNS entries, and certificate manager to offer HTTPS access to user workloads via unique URLs. Documentation on how to utilize this is available at this link to web app docs.

Kubernetes (K8s) is the industry standard for container orchestration. CISL operates a K8s cluster built on Linux, running on both virtual machines and bare metal. The cluster includes shared resources that enable quick and secure application deployment, offering various persistent storage options.

This cluster hosts applications such as Rancher, Harbor, Argo CD, JupyterHub, Binder, JupyterBook, and custom-built web applications for data science. More details about these applications are available below or on other pages in this documentation.

The cluster is configured with an ingress controller, an API to create DNS entries, and a certificate manager to provide HTTPS access to user workloads via unique URLs. Documentation on how to use these features is available at this web app documentation.

Documentation#

This documentation site is built on Jupyter Book and hosted via GitHub pages and Kubernetes. We use Continuous Integration (CI) and Continuous Delivery (CD) with GitHub Actions and Argo CD to update the site when content is added or changed. The site customizes the Pythia Sphinx Theme to align with UCAR|NCAR branding. It is a valuable resource for discovering and utilizing the NSF NCAR | CISL Cloud services.

If any pages are unclear, missing information, or require edits, please open a GitHub issue using the GitHub icon at the top of each page. Please provide a detailed issue description to help us update the page appropriately.

Storage#

Rook#

Rook is used to provide storage orchestration to K8s workloads. Rook utilizes Ceph as a distributed storage system to provide file, block, and object storage capabilities to the K8s cluster and the underlying objects hosted.

GLADE#

The Globally Accessible Data Environment (GLADE) is a centralized file service that uses high-performance GPFS shared file system technology. GLADE provides users with a common view of their data across the HPC, analysis, and visualization resources managed by CISL.

The Kubernetes cluster has read-only access to the GLADE /collections and /campaign directories. This setup allows us to mount these directories directly in JupyterHub environments, mimicking data access on the HPC JupyterHub. Additionally, GLADE can be mounted as a volume inside containers, enabling interactive visualizations without including the data in the container image, thus reducing image size.

Stratus Object Storage#

NSF NCAR | CISL provides an on-premise object storage solution called Stratus. Stratus can be accessed via API calls similar to Amazon AWS S3. For more information, see the Introduction to Stratus.

You can always find a link to the Stratus Web UI on this documentation page under the Resources tab at the top. You can also access it directly below:

Log in to Stratus

Note

Stratus is hosted on UCAR internal networks, requiring VPN or internal network access for use.

Note

The Stratus Web UI requires an Access ID and Secret Key to browse. If you do not have these credentials, please see Access

Web Hosting#

CISL provides the ability to host containerized web applications on Kubernetes. Using Kubernetes (K8s) to host these workloads offers advantages such as HTTPS URL access and highly available, redundant compute resources.

For more information, see Hosting web applications on Kubernetes (K8s)

Continuous Deployment#

We have an instance of Argo CD installed to manage Continuous Delivery (CD). With Argo CD, your application’s Git repository can be automatically configured to deploy any changes made to that repository, without user or admin intervention. This allows users to deploy their applications to Kubernetes (K8s) automatically, without directly interacting with Kubernetes.

For more information, see Hosting web applications on Kubernetes (K8s).

Note

If you have an application that you want to integrate into our CD platform, see Create Issue on Jira. One of our administrators will contact you to assist with the deployment.

Harbor - Container Registry#

We use Harbor to provide a container registry based on open source software that is closer to the infrastructure running containers. A local registry leverages network infrastructure and available bandwidth between hardware to increase speed when pushing and pulling images locally. Harbor also includes an image scanner that provides reports on any vulnerabilities in an image, allowing us to address security concerns directly.

For more information, see Harbor - Container Registry.

A link to Harbor is available under the Resources tab at the top of this documentation. You can also access it directly below:

Log in to Harbor

JupyterHub#

A JupyterHub instance is deployed on Kubernetes using the Daskhub Helm chart. For more details, see Using the NSF NCAR K8s JupyterHub.

Dask Gateway is included in the Daskhub chart, enabling scalable parallel computing within JupyterHub.

JupyterHub KubeSpawner creates single-user environments with access to shared and persistent personal storage. The spawned user environments come with various default resource sizes, including a GPU option. This spawner uses a customized Docker image with packages, kernels, and extensions tailored to the scientific research community, enhancing productivity in data analysis. The custom environment also provides users with read-only access to the campaign and collections directories on GLADE, as well as a shared directory for specific use cases.

You can always find a link to the NSF NCAR JupyterHub under the Resources tab at the top of this documentation page. You can also access it directly below:

Log in to NSF NCAR JupyterHub

Note

Access is currently granted to members of the NCAR/2i2c-cloud-users Github Team. For more information, see Requesting Access

Binder#

Binder is a tool that enables sharing custom computing environments based on code repository contents. For example, if a code repository contains Jupyter Notebooks that demonstrate how to perform specific tasks, Binder can launch a compute instance via JupyterHub that is automatically configured to run the repository’s contents.

NSF NCAR Rancher#

NSF NCAR operates an instance of Rancher, an open-source container management platform, to support the community.

A link to the NSF NCAR Rancher instance is available under the Resources tab at the top of this documentation page. You can also access it directly below:

Log in to NSF NCAR Rancher

2i2c JupyterHub#

2i2c operates a JupyterHubinstance for NSF NCAR on Amazon Web Services (AWS).

A link to the 2i2c JupyterHub is available under the Resources tab at the top of this documentation page. You can also access it directly below:

Log in to 2i2c JupyterHub

Note

Access is currently granted to members of the NCAR/2i2c-cloud-users Github Team. If you require access please follow these instructions

Storage#

Data Storage for the 2i2c JupyterHub instance is provided by AWS Elastic File System (EFS). NSF NCAR internal data from GLADE and Stratus is not be available from the 2i2c JupyterHub instance.

Virtual Machines (VMs)#

If containers aren’t suitable for your solution, virtual machines (VMs) can be provided. To request a VM, use the {NCAR/UCAR Service Desk](https://ithelp.ucar.edu/servicedesk/customer/portal/2/create/17). You can always find a link to the NCAR/UCAR Service Desk under the Resources tab at the top of this documentation page.

In your request, please specify the number of CPUs, amount of memory (GB), required disk size (GB), and the desired operating system. If you have any special requests or requirements, include them as well.

Agile Program Management#

Kanban Board

This project uses a hybrid Agile Project Management workflow. High-level project management follows Waterfall techniques, while daily tasks use Kanban to create a continuous flow of value to users.

Vision#

Provide and operate an on-premise cloud offering for the scientific community to supplement traditional HPC services and public cloud offerings while utilizing 2i2c to host a JupyterHub instance in the public cloud.

Goals#

  • Improve understanding of how scientific community might use and benefit from an on-prem cloud

    • Which services fit on-prem better than traditional HPC and/or public cloud

  • Gain experience within CISL deploying and operating an on-prem cloud and associated services

  • Improve CISL ability to support interactive analysis workflows in environment where data is globally distributed

  • Increase user visibility in to on-prem cloud offerings

  • Develop metrics to showcase project value & feasibility

  • Gain experience with Agile Project Management

Use Cases#

Numerous users in the scientific community were solicited for input on what they would find beneficial in a cloud like environment. Here is a short high level outline of what those requests were.