{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "# Writing multiple netCDF files in parallel with xarray and dask\n", "\n", "A typical computation workflow with xarray consists of:\n", "\n", "1. reading one or more netCDF files into an xarray dataset backed by dask using `xr.open_mfdataset()` or `xr.open_dataset(chunks=...)`,\n", "2. applying some transformation to the input dataset, and\n", "3. saving the resulting output to disk in a netCDF file using `xr.to_netcdf()`.\n", "\n", "\n", "The last step (3) can easily lead to a large netCDF file (>=10GB in size). As a result, this step can take a very long time to complete (since it is run in serial), and sometimes may hang. So, to avoid these issues one can use one of the lesser-used but helpful xarray capabilities: the [`xr.save_mfdataset()`](https://xarray.pydata.org/en/latest/generated/xarray.save_mfdataset.html) function. This function allows users to write multiple datasets to disk as netCDF files simultaneously. The `xr.save_mfdataset()` function signature looks like this:\n", "\n", "```python\n", "xr.save_mfdataset(\n", " datasets,\n", " paths,\n", " mode='w',\n", " format=None,\n", " groups=None,\n", " engine=None,\n", " compute=True,\n", ")\n", "Docstring:\n", "Write multiple datasets to disk as netCDF files simultaneously.\n", "```\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Please show me the code" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Package imports" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'0.15.1'" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "import xarray as xr\n", "from distributed import Client, performance_report\n", "\n", "xr.__version__" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n",
"Client\n", "
| \n",
"\n",
"Cluster\n", "
| \n",
"
array([cftime.DatetimeNoLeap(1980-09-16 12:00:00),\n", " cftime.DatetimeNoLeap(1980-10-17 00:00:00),\n", " cftime.DatetimeNoLeap(1980-11-16 12:00:00),\n", " cftime.DatetimeNoLeap(1980-12-17 00:00:00),\n", " cftime.DatetimeNoLeap(1981-01-17 00:00:00),\n", " cftime.DatetimeNoLeap(1981-02-15 12:00:00),\n", " cftime.DatetimeNoLeap(1981-03-17 00:00:00),\n", " cftime.DatetimeNoLeap(1981-04-16 12:00:00),\n", " cftime.DatetimeNoLeap(1981-05-17 00:00:00),\n", " cftime.DatetimeNoLeap(1981-06-16 12:00:00),\n", " cftime.DatetimeNoLeap(1981-07-17 00:00:00),\n", " cftime.DatetimeNoLeap(1981-08-17 00:00:00),\n", " cftime.DatetimeNoLeap(1981-09-16 12:00:00),\n", " cftime.DatetimeNoLeap(1981-10-17 00:00:00),\n", " cftime.DatetimeNoLeap(1981-11-16 12:00:00),\n", " cftime.DatetimeNoLeap(1981-12-17 00:00:00),\n", " cftime.DatetimeNoLeap(1982-01-17 00:00:00),\n", " cftime.DatetimeNoLeap(1982-02-15 12:00:00),\n", " cftime.DatetimeNoLeap(1982-03-17 00:00:00),\n", " cftime.DatetimeNoLeap(1982-04-16 12:00:00),\n", " cftime.DatetimeNoLeap(1982-05-17 00:00:00),\n", " cftime.DatetimeNoLeap(1982-06-16 12:00:00),\n", " cftime.DatetimeNoLeap(1982-07-17 00:00:00),\n", " cftime.DatetimeNoLeap(1982-08-17 00:00:00),\n", " cftime.DatetimeNoLeap(1982-09-16 12:00:00),\n", " cftime.DatetimeNoLeap(1982-10-17 00:00:00),\n", " cftime.DatetimeNoLeap(1982-11-16 12:00:00),\n", " cftime.DatetimeNoLeap(1982-12-17 00:00:00),\n", " cftime.DatetimeNoLeap(1983-01-17 00:00:00),\n", " cftime.DatetimeNoLeap(1983-02-15 12:00:00),\n", " cftime.DatetimeNoLeap(1983-03-17 00:00:00),\n", " cftime.DatetimeNoLeap(1983-04-16 12:00:00),\n", " cftime.DatetimeNoLeap(1983-05-17 00:00:00),\n", " cftime.DatetimeNoLeap(1983-06-16 12:00:00),\n", " cftime.DatetimeNoLeap(1983-07-17 00:00:00),\n", " cftime.DatetimeNoLeap(1983-08-17 00:00:00)], dtype=object)
\n",
"
| \n",
"\n", "\n", " | \n", "
\n",
"
| \n",
"\n", "\n", " | \n", "
\n",
"
| \n",
"\n", "\n", " | \n", "
\n",
"
| \n",
"\n", "\n", " | \n", "
\n",
"
| \n",
"\n", "\n", " | \n", "
array([cftime.DatetimeNoLeap(1980-09-16 12:00:00),\n", " cftime.DatetimeNoLeap(1980-10-17 00:00:00),\n", " cftime.DatetimeNoLeap(1980-11-16 12:00:00),\n", " cftime.DatetimeNoLeap(1980-12-17 00:00:00),\n", " cftime.DatetimeNoLeap(1981-01-17 00:00:00),\n", " cftime.DatetimeNoLeap(1981-02-15 12:00:00),\n", " cftime.DatetimeNoLeap(1981-03-17 00:00:00),\n", " cftime.DatetimeNoLeap(1981-04-16 12:00:00),\n", " cftime.DatetimeNoLeap(1981-05-17 00:00:00),\n", " cftime.DatetimeNoLeap(1981-06-16 12:00:00),\n", " cftime.DatetimeNoLeap(1981-07-17 00:00:00),\n", " cftime.DatetimeNoLeap(1981-08-17 00:00:00),\n", " cftime.DatetimeNoLeap(1981-09-16 12:00:00),\n", " cftime.DatetimeNoLeap(1981-10-17 00:00:00),\n", " cftime.DatetimeNoLeap(1981-11-16 12:00:00),\n", " cftime.DatetimeNoLeap(1981-12-17 00:00:00),\n", " cftime.DatetimeNoLeap(1982-01-17 00:00:00),\n", " cftime.DatetimeNoLeap(1982-02-15 12:00:00),\n", " cftime.DatetimeNoLeap(1982-03-17 00:00:00),\n", " cftime.DatetimeNoLeap(1982-04-16 12:00:00),\n", " cftime.DatetimeNoLeap(1982-05-17 00:00:00),\n", " cftime.DatetimeNoLeap(1982-06-16 12:00:00),\n", " cftime.DatetimeNoLeap(1982-07-17 00:00:00),\n", " cftime.DatetimeNoLeap(1982-08-17 00:00:00),\n", " cftime.DatetimeNoLeap(1982-09-16 12:00:00),\n", " cftime.DatetimeNoLeap(1982-10-17 00:00:00),\n", " cftime.DatetimeNoLeap(1982-11-16 12:00:00),\n", " cftime.DatetimeNoLeap(1982-12-17 00:00:00),\n", " cftime.DatetimeNoLeap(1983-01-17 00:00:00),\n", " cftime.DatetimeNoLeap(1983-02-15 12:00:00),\n", " cftime.DatetimeNoLeap(1983-03-17 00:00:00),\n", " cftime.DatetimeNoLeap(1983-04-16 12:00:00),\n", " cftime.DatetimeNoLeap(1983-05-17 00:00:00),\n", " cftime.DatetimeNoLeap(1983-06-16 12:00:00),\n", " cftime.DatetimeNoLeap(1983-07-17 00:00:00),\n", " cftime.DatetimeNoLeap(1983-08-17 00:00:00)], dtype=object)
\n",
"
| \n",
"\n", "\n", " | \n", "