# Prestage model data

This notebook downloads data from the [NCAR DASH repository](https://doi.org/10.5065/fepv-0z52) where the modeling data for this study has been archived {cite:p}`Long2021-ak` and also ensures that a dataset curated via [Intake](https://intake.readthedocs.io/en/latest/) is accessibleâ€”local caching of this dataset happen automatically behind the scenes.

First, we demonstrate the various local storage locations used to support the calculation.

In [10]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [11]:
import os
from subprocess import Popen, PIPE
import tarfile

import xarray as xr
xr.set_options(display_style='text')

import config

## Print storage locations

In [12]:
config.get("project_tmpdir")

'/glade/work/mclong/so-co2-airborne-obs'

In [13]:
config.get("project_tmpdir_obs")

'/glade/work/mclong/so-co2-airborne-obs/obs-data'

In [14]:
config.get("model_data_dir_root")

'/glade/work/mclong/so-co2-airborne-obs/model-data'

In [15]:
config.get("model_data_dir")

'/glade/work/mclong/so-co2-airborne-obs/model-data/Long-etal-2021-SO-CO2-Science'

In [16]:
config.get("dash_asset_fname")

'Long-etal-2021-SO-CO2-Science.tar.gz'

## Get data from DASH repo

Use a DASH-NCAR provided `wget` script to download the modeling data in {cite:t}`Long2021-ak`. This won't work on machines that do not support `wget` (i.e., MacOS).

In [17]:
if not os.path.isdir(config.get("model_data_dir")):
    # run wget to stage data
    # TODO: support curl too
    cwd = os.getcwd()
    script = f'{cwd}/wget-dash-archive.sh'

    os.chdir(config.get("model_data_dir_root"))

    p = Popen(['bash', script], stdout=PIPE, stderr=PIPE)
    stdout, stderr = p.communicate()
    if p.returncode:    
        print(stderr.decode('UTF-8'))
        print(stdout.decode('UTF-8'))
        raise OSError('data transfer failed')    

    # untar archive
    assert os.path.isfile(config.get("dash_asset_fname")), f'missing {config.get("dash_asset_fname")}'
    tar = tarfile.open(config.get("dash_asset_fname"), "r:gz")
    tar.extractall()
    tar.close()

    os.chdir(cwd)

os.listdir(config.get("model_data_dir"))

['TM5-Flux-mrf',
 'TM5-Flux-m0f',
 'CT2019B',
 's99oc_SOCCOM_v2020',
 's99oc_v2020',
 's99oc_ADJocI40S_v2020',
 'CAMSv20r1',
 'CT2017',
 'MIROC',
 'README.md',
 'CTE2018',
 'TM5-Flux-mmf',
 'TM5-Flux-mwf',
 'CTE2020']

## Check on `intake` datasets

The `models` sub-package includes an [Intake](https://intake.readthedocs.io/en/latest/) catalog file providing access to the CO<sub>2</sub> air-sea flux product of {cite:t}`Landschutzer2016-wg`. Here, we simply request that dataset; `intake` is configured to cache the dataset locally.

In [18]:
import models

ds = models.dataset_som_ffn.open_dataset()
ds