4km WRF Simulation of Current Climate over South America by SAAG: diagnostic plots

Data Access¶

This notebook illustrates how to make diagnostic plots using the dataset produced by the South America Affinity Group (SAAG) hosted on NCAR’s Geoscience Data Exchange (GDEX).
https://gdex.ucar.edu/datasets/d616000/#
This data is open access and can be accessed via 3 protocols
1. posix (if you have access to NCAR’s HPC systems like Casper or Derecho)
2. HTTPS or
3. OSDF using intake-ESM catalogs.
Learn about intake-ESM: https://intake-esm.readthedocs.io/en/stable/

#Imports
import intake
import numpy as np
import pandas as pd
import xarray as xr
import seaborn as sns
import matplotlib.pyplot as plt
import os

# import fsspec.implementations.http as fshttp
# from pelicanfs.core import PelicanFileSystem, PelicanMap, OSDFFileSystem

import dask 
from dask_jobqueue import PBSCluster
from dask.distributed import Client
from dask.distributed import performance_report

cat_url     = '/gdex/data/d616000/catalogs/d616000_catalog.json' #POSIX access on NCAR
# cat_url     = 'https://osdf-data.gdex.ucar.edu/ncar/gdex/d616000/catalogs/d616000_catalog-http.json' #HTTPS access
# cat_url     = 'https://osdf-data.gdex.ucar.edu/ncar/gdex/d616000/catalogs/d616000_catalog-osdf.json' #OSDF access
print(cat_url)

/gdex/data/d616000/catalogs/d616000_catalog.json

# Set up your scratch folder path
username       = os.environ["USER"]
glade_scratch  = "/glade/derecho/scratch/" + username
print(glade_scratch)

/glade/derecho/scratch/harshah

Create a PBS cluster¶

# Create a PBS cluster object
cluster = PBSCluster(
    job_name = 'dask-wk25-hpc',
    cores = 1,
    memory = '8GiB',
    processes = 1,
    local_directory = glade_scratch+'/dask/spill/',
    log_directory = glade_scratch + '/dask/logs/',
    resource_spec = 'select=1:ncpus=1:mem=8GB',
    queue = 'casper',
    walltime = '5:00:00',
    #interface = 'ib0'
    interface = 'ext'
)

# Scale the cluster and display cluster dashboard URL
n_workers = 5
client = Client(cluster)
cluster.scale(n_workers)
client.wait_for_workers(n_workers = n_workers)
cluster

Load SAAG data from NCAR’s GDEX using an intake catalog¶

col = intake.open_esm_datastore(cat_url)
col

col.df turns the catalog object into a pandas dataframe!
(Actually, it accesses the dataframe attribute of the catalog)

col.df

Select data and plot¶

What if you don’t know the variable names ?¶

Use pandas logic to print out the short_name and long_name

col.df[['variable','long_name']]

We notice that long_name is not available for some variables like ‘V’
In such cases, please look at the dataset documentation for additional information: https://gdex.ucar.edu/datasets/d616000/documentation/#

Temperature¶

Plot temperature for a random date

cat_temp = col.search(variable='T2')
cat_temp.df.head()

The data is organized in (virtual) zarr stores with one year’s worth of data in one file
Select a year. This is done by selcting the start time to be Jan 1st of that year or the end time to be Dec 31st of the same year
This also means that if you want to request data for other days, say Oct 1 for the year YYYY, you first have to load the data for one year YYYY and then select the data for that particular day. This example is discussed below.

date = "2020-01-01"
# year = "2021"
cat_temp_subset = cat_temp.search(start_time = date)
cat_temp_subset

Load data into xarray¶

# Load catalog entries for subset into a dictionary of xarray datasets, and open the first one.
dsets = cat_temp_subset.to_dataset_dict(zarr_kwargs={"consolidated": True})
print(f"\nDataset dictionary keys:\n {dsets.keys()}")


--> The keys in the returned dictionary of datasets are constructed as follows:
	'variable.short_name'

# Load the first dataset and display a summary.
dataset_key = list(dsets.keys())[0]
# store_name = dataset_key + ".zarr"
print(dsets.keys())
ds = dsets[dataset_key]
ds = ds.T2
ds

%%time
desired_date = "2020-10-01"
ds_subset = ds.sel(Time=desired_date,method='nearest')
ds_subset

%%time
ds_subset.plot(cmap='inferno')

cluster.close()