Populating a VDC

Note: The capabilities discussed in this document require proper user environment setup. Please see the User Environment Setup discussion appropriate for your platform and installation type in the installation documentation.

After a .vdf metafile has been created with vdfcreate or another utility, the VDC must be populated with variable field data. This document describes process of populating a VDC using command line tools included in the VAPOR package. The choice of which command line tool to use depends on the source of your data, as show in the table below. Additionally, the VDCWizard GUI may be used to populate VDC. File formats supported by vdcwizard are also shown in the table below.

 

Tools for populating a VDC
Data file source Command line tool Supported by vdcwizard
WRF-ARW wrf2vdf yes
MOM4 mom2vdf yes
POP mom2vdf yes
ROMS roms2vdf yes
GRIB grim2vdf yes
CAM cam2vdf yes
Generic NetCDF files ncdf2vdf no
raw or other data raw2vdf no

 

 

Populating a VDC from WRF-ARW data

If your input data arrises from a WRF-ARW simulation and your .vdf file was created with, for example, the command:

wrfvdfcreate wrfout_2005-08-29_00 wrfout_2005-08-29_01 wrfout_2005-08-29_02 mywrfdata.vdf

You can convert the variables found in the wrfout files to the VDC associated with mywrfdata.vdf using the command:

wrf2vdf mywrfdata.vdf wrfout_2005-08-29_00 wrfout_2005-08-29_01 wrfout_2005-08-29_02

Note that the wrfvdfcreate command specifies the name of the .vdf file after the names of all the wrfout files, while the wrf2vdf command specifies the .vdf file before the wrfout files.

The above command would convert all variables that are found in both the wrfout files and the .vdf file. If only a subset of the varibles is desired, the list can be limited with the -vars option. For example:

wrf2vdf -vars U:V:W mywrfdata.vdf wrfout_2005-08-29_00

would convert only the U, V, and W variables from the first wrfout file. Note: this is not stricly true as a small subset of WRF variables are required by any VDC from proper handling of geo-referenced data. These required variables are converted in addition to those specifed by the -vars option.

 

You can also convert netCDF files that are not WRF output, as long as they have the same dimensions as those found in the WRF output for which the .vdf file was created. See below.

Populating a VDC from MOM4, POP, ROMS, CAM, or GRIB data

If your data were generated by the MOM4, POP, or ROMS ocean models one of the mom2vdf (for MOM4 and POP) or roms2vdf commands should be used to populate the VDC. These commands are described in detail in the reference manual.  Converters for GRIB and CAM can be found on the grib2vdf and cam2vdf manual pages.

Populating a VDC from netCDF files

Data stored in the netCDF file format may also be used to populate a VDC with the ncdf2vdf command. However, netCDF is a very flexible file format, and not all variables stored in a netCDF file may be formatted in a way that permits their conversion with the ncdf2vdf utility. A 2D or 3D variable stored in a netCDF file may only be converted by ncdf2vdf if its fastest varying dimensions exactly match the dimensions of the VDC. For example, if the netCDF variables are defined as shown by the output of the netCDF ncdump command as:

netcdf file.nc {
dimensions:
Time = UNLIMITED ; // (20 currently)
west_east = 200 ;
south_north = 100 ;
bottom_top = 35 ;
variables:
float T(Time, bottom_top, south_north, west_east) ;
}

then assuming the .vdf file had been created with ncdfvdfcreate the following ncdf2vdf command would convert time step 0 of the input file, file.nc, and store it as time step 0 of the VDC:

ncdf2vdf -timedim Time -ts 0 -vars T  file.nc mydata.vdf

Populating a VDC with raw2vdf

The raw2vdf command reads a single varible, at a single time step, from disk, stored as a block of floats (a contiguous, 3D array of unformatted, 32bit, binary floating point values with no header or trailer information), transforms the data into wavelet space, and stores it in a VDC. Assuming the .vdf file was created with the vdfcreate command: 

vdfcreate –dimension 512x512x512 –numts 100 –level 3 –vars3d t:p mydata.vdf

then the following command:

raw2vdf -ts 0 –varname vx mydata.vdf rawvx.000.float

would transform the variable  stored in the file rawvx.000.float and write it into the VDC associated with the mydata.vdf metafile. The time step and variable name would be 0 and vx, respectively. The volume contained in rawvx.000.float must have a resolution of 512^3 as defined in the .vdf file. Furthermore, the volume will undergo three wavelet transformations, resulting in a coarsest resolution of 64^3.

Subsequent, later time steps can be transformed by incrementing the integer argument to the -ts option. For example:

raw2vdf -ts 1 –varname vx mydata.vdf rawvx.001.float

Parallel data conversion on compute clusters

For exceptionally large data sets, it can save a lot of time to process the VDC contents in parallel.  In VAPOR 2.5, Python scripts have been added to the $(VAPOR_HOME)/share/examples/parallelDataConversion/ directory.  There, users will find scripts that have been tailored to the LSF and SGE batch schedulers; named vaporLSF.py and vaporSGE.py.  These scripts will subdivide the set of input files into a series of processes that will run in parallel for the aformentioned schedulers.

These scripts can be converted to be used for other schedulers that accept "Array Job" processing.  Further documentation can be found here.

Release availability: version 2.5 and beyond

Learning more

Complete documentation on the all of the VAPOR command line utilities is available in the reference manual.

Invoking either command with the -help option will generate a listing of all the available command line options and provide a terse description of their meaning.

Information on various programatic APIs for converting data to a VDC may be found here [need a link]