Overview

Overview

The VAPOR data analysis environment targets rectilinear or structured gridded data sets that are time-varying, multivariate, and possessing very high spatial resolutions. Aggregate data sets generated from a single experiment that are terabytes in size are not uncommon. To accommodate the unique needs of these large data sets, VAPOR defines its own mechanism for storing sampled data and its associated attributes (metadata). In the VAPOR environment, a collection of related data, typically having been produced from a single numerical simulation, is known as a VAPOR Data Collection (VDC).

A VDC is composed of two components: metadata and field data. Metadata are data that describe field data. Examples of metadata include the grid type, spatial resolution, name of the field variables, number of time steps, and possibly user-defined attributes. Field data are the numerical outputs produced by the simulation (sampled 2D or 3D functions). Examples include: components of a velocity field, a temperature field, etc.

The VDC model is different from more traditional scientific data representations, such as netCDF and hdf, in two important ways:

  1. Field data are stored as wavelet transformed coefficients. I.e. field data undergo a user-defined number (and type) of wavelet transforms before they are written to a file. Inverse wavelet transforms may be easily and efficiently applied to the stored wavelet coefficients, and the original field data reconstructed. Furthermore, the data need not be reconstructed at its original grid resolution. The user may elect to progressively access the data,  reconstructing a coarsened approximation of the original data in order to reduce memory requirements, processing time, etc.
  2. VDC data (metadata and field data) are not stored in a single file as is commonly done with other scientific data formats. Instead, metadata, individual field data time steps, variables, and wavelet coefficients are all stored in separate files. Distributing the pieces in this manner is essential for effectively managing terabyte sized data collections.

Prior to analyzing your gridded data with VAPOR you must first convert your data to a VDC. There are a number of tools provided by VAPOR for performing this data conversion. The remainder of this document discusses your options.

Note: VAPOR also supports the direct import of some data formats without prior conversion to a VDC. However, VAPOR's progressive data access capabilities are not available when data is directly imported.