CISL Cloud Pilot#

This is currently a Pilot project with an objective to determine long-term feasibility.


Warning

This website is in active development and updated frequently. Also, this project is still in a pre-production stage and user data stored on our services is not guaranteed. Please utilize code repositories such as GitHub and production storage systems such as GLADE, Stratus, or a public cloud option. This ensures your work is backed up while making it easier to reproduce it quickly in different environments.

Welcome to the NSF NCAR | CISL Cloud Pilot documentation page. Below is a collection of useful information and links to help utilize resources for interactive scientific analysis.

About this Project#

CISL has deployed a pilot on-premise prototype cloud environment for compute and storage.

We partnered with 2i2c to deploy a JupyterHub instance in AWS as well. We leveraged the configuration of that instance to help get the on-prem JupyterHub built.

What is an on-premise cloud?#

NSF NCAR | CISL runs Compute, Storage & Network hardware in robust Data Centers at multiple organizational facilities. An on-premise cloud offers users the ability to utilize those highly available, organizationally supported, compute resources for approved use cases. This includes access to routable network space and UCAR Domain Name Systems (DNS). Security standards set by the organization are implemented and controlled by administrators to make sure internal policies are being adhered to. These resources would be provided to supplement computing needs that aren’t fulfilled by the HPC offering, public cloud, or what is available to you locally.

Resources#

The following resources were deployed based on interactions with users in the scientific community and their requested use cases. A full outline of the high level use cases can be found further down this page.

GitHub#

An Infrastructure as Code (IaC) approach is utilized in conjunction with Git version control to maintain a consistent and reproducible environment. This code is stored publicly in GitHub which enables team collaboration while maintaining version controlled and reusable configurations. Sensitive information is encrypted with SOPS and age giving administrators the ability to maintain security while providing all the information required to reproduce deployments. GitHub Actions is utilized to build container images on self-hosted runners deployed to the on-prem cloud and can integrate changes to configuration files used in Continuous Delivery (CD).

Kubernetes (K8s)#

Note

This page does not cover what kubernetes is. If you would like to know more about kubernetes use this link to kubernetes.io.

Kubernetes, often referred to as K8s with the 8 simply standing for the number of letters being replaced, is the industry standard for container orchestration. CISL operates a K8s cluster built on top of Linux running on Virtual Machines and bare metal. The cluster is architected to include shared resources that enable quick and secure ways to expose applications while providing a variety of persistent storage options. This is all utilized to host applications such as Rancher, Harbor, Argo CD, JupyterHub, Binder, JupyterBook, and custom built web applications for data science applications. More details about these applications can be found below or via the different pages included in this documentation.

The cluster is configured with an ingress controller, an API to create DNS entries, and certificate manager to offer HTTPS access to user workloads via unique URLs. Documentation on how to utilize this is available at this link to web app docs.

Documentation#

This documentation site is built on top of Jupyter Book and is hosted via GitHub pages and on K8s. We utilize CI/CD with GitHub actions and Argo CD to update when we add or change content. It customizes the Pythia Sphinx Theme to align with UCAR|NCAR branding. It should be a great resource to discover and utilize the services offered in the NCAR | CISL Cloud.

If any pages are unclear, are missing information, or require any edits & updates please open a github issue by using the dropdown GitHub icon at the top of each page. Please be descriptive in your issue description so the page can be updated appropriately.

Storage#

Rook#

Rook is used to provide storage orchestration to K8s workloads. Rook utilizes Ceph as a distributed storage system to provide file, block, and object storage capabilities to the K8s cluster and the underlying objects hosted.

GLADE#

The K8s cluster has read only access setup to GLADE /collections and /campaign. This allows us to mount it directly in JupyterHub environments so it mimics how data is accessed on HPC JupyterHub. The ability to mount GLADE as a volume inside containers is also possible and can be incorporated in interactive visualizations for example so data does not have to be included in the container image which reduces image size.

Stratus Object Storage#

NCAR has an object storage solution on premise called Stratus. Stratus can be accessed via API calls in a similar fashion to Amazon AWS S3. The main documentation page for this service can be found at this link to internal Documentation

A link to the Stratus Web UI can always be found on this documentation page under the Resources tab at the top. It can also be found below:

Login to Stratus

Note

Stratus is hosted on NCAR internal networks and VPN or internal network access is required for access.

Note

The Stratus Web UI requires an Access ID and Secret Key to browse. If you do not have an Access ID and Secret Key please follow the access documentation

Web Hosting#

CISL provides the ability to host containerized web applications on K8s. Using K8s to host these containerized workloads offers advantages when it comes to accessing applications via HTTPS URLs while providing highly available and redundant compute resources.

The main documentation page on this service can be found by click this link to internal Web Hosting Documentation

Continuous Deployment#

We have an instance of Argo CD installed to help us handle Continuous Delivery (CD). What this ultimately means is if your applications Git repo is setup in Argo CD it can be automatically configured to deploy any changes made to that repository without any intervention by the user or admins. This allows users to deploy their applications automatically to K8s without having to worry about interacting directly with Kubernetes. For more information, see Hosting web applications on Kubernetes (K8s).

Note

If you have an application that you want to integrate into our CD platform please submit a ticket here and one of our administrators will be in tough to work through deployment with you.

Harbor - Container Registry#

We utilize Harbor to provide a container registry based on open source software that is closer to the infrastructure running containers. A local registry allows us to utilize network infrastructure and available bandwidth between hardware for an increase in speed when pushing and pulling images locally. Harbor also includes an image scanner that will provide reports on any vulnerabilities that an image contains so we can address security concerns with images directly.

For more information, see Harbor - Container Registry

A link to Harbor is present under the Resources tab at the top of this documentation. It can also be found below:

Link to Login to Harbor

JupyterHub#

A JupyterHub instance is deployed on K8s via the Daskhub Helm chart. The main documentation page for this service can be found by clicking this link to internal K8s JupyterHub Documentation

Dask Gateway is also installed in the Daskhub chart and enables scalable parallel computing within JupyterHub.

The JupyterHub KubeSpawner creates single user environments with access to shared and persistent personal storage space. The Spawned user environments come with different default resource sizes with a GPU option. This spawner uses a customized Docker image that enables packages, kernels, and extensions the scientific research community utilizes to increase productivity in data analysis. The custom environment also provides users read-only access to the campaign and collections directories on GLADE as well as a shared directory whose specific use case is still being fleshed out.

A link to the NCAR JupyterHub can always be found on this documentation page under the Resources tab at the top. It can also be found below:

Link to Login to NCAR JupyterHub

Note

Access is currently granted to members of the NCAR/2i2c-cloud-users Github Team. If you require access please follow these instructions

Binder#

Binder is a tool that enables sharing of custom computing environments from code repository contents. For instance, if there is a code repository that contains some Jupyter Notebooks that explain how to do something, Binder can be used to launch a compute instance via JupyterHub that is configured automatically to run the contents of the repository.

NCAR Rancher#

NCAR operates an instance of Rancher to provide a container management platform to the community.

A link to the NCAR Rancher instance can always be found on this documentation page under the Resources tab at the top. It can also be found below:

Link to Login to NCAR Rancher

Note

Rancher is an open source container management platform built for organizations that deploy containers in production.

2i2c JupyterHub#

2i2c operates a JupyterHub instance for NCAR use on AWS.

A link to the 2i2c JupyterHub can always be found on this documentation page under the Resources tab at the top. It can also be found below:

Login to 2i2c JupyterHub

Note

Access is currently granted to members of the NCAR/2i2c-cloud-users Github Team. If you require access please follow these instructions

Storage#

Data Storage for the 2i2c JupyterHub instance is provided by AWS Elastic File System (EFS). NCAR internal data from GLADE and Stratus will not be available from the 2i2c JupyterHub instance.

Virtual Machines (VMs)#

Virtual machines can be provided if containers aren’t the best use case for the solution. Right now the process to get a VM is to submit a ticket with the NCAR/UCAR Service Desk here. A link to the NCAR/UCAR service desk to request a VM can always be found on this documention page under the Resources tab at the top.

In your request please provide the number of CPUs, amount of memory (GB), required disk size (GB), and the operating system you want to run. If you have any other special requests or requirements please provide them here.

Agile Program Management#

Kanban Board

This project is implementing a hybrid Agile Project Management workflow. Waterfall techniques will be used for high level project management. Kanban will be used for day to day tasks and creating a continuous flow of value to users.

Vision#

Provide and operate an on-premise cloud offering for the scientific community to supplement traditional HPC services and public cloud offerings while utilizing 2i2c to host a JupyterHub instance in the public cloud.

Goals#

  • Improve understanding of how scientific community might use and benefit from an on-prem cloud

    • Which services fit on-prem better than traditional HPC and/or public cloud

  • Gain experience within CISL deploying and operating an on-prem cloud and associated services

  • Improve CISL ability to support interactive analysis workflows in environment where data is globally distributed

  • Increase user visibility in to on-prem cloud offerings

  • Develop metrics to showcase project value & feasibility

  • Gain experience with Agile Project Management

Use Cases#

Numerous users in the scientific community were solicited for input on what they would find beneficial in a cloud like environment. Here is a short high level outline of what those requests were.