PIO  2.5.4
Describing decompositions

One of the biggest challenges to working with PIO is setting up the decomposition of the data (Fortran users see Define a Decomposition, C users Initialize a Decomposition). The user must properly describe how the data within each MPI tasks memory should be placed or retrieved from disk.

Compmap

When initializing a new decomposition, each task calling PIOc_init_decomp() or PIO_initdecomp().

Rearrangers

PIO provides two methods to rearrange data from compute tasks to IO tasks.

Box rearrangement

In this method data is rearranged from compute to IO tasks such that the arrangement of data on the IO tasks optimizes the call from the IO tasks to the underlying (NetCDF) IO library. In this case each compute task will transfer data to one or more IO tasks.

Subset rearrangement

In this method each IO task is associated with a unique subset of compute tasks so that each compute task will transfer data to one and only one IO task. Since this technique does not guarantee that data on the IO node represents a contiguous block of data on the file it may require multiple calls to the underlying (NetCDF) IO library.

As an example suppose we have a global two dimensional grid of size 4x5 decomposed over 5 tasks. We represent the two dimensional grid in terms of offset from the initial element ie

     0  1  2  3 
     4  5  6  7 
     8  9 10 11
    12 13 14 15
    16 17 18 19 

Now suppose this data is distributed over the compute tasks as follows:

0: {   0  4 8 12  } 
1: {  16 1 5 9  } 
2: {  13 17 2 6  } 
3: {  10 14 18 3  } 
4: {   7 11 15 19  } 

If we have 2 io tasks the Box rearranger would give:

0: { 0  1  2  3  4  5  6  7  8  9  }
1: { 10 11 12 13 14 15 16 17 18 19 }

While the subset rearranger would give:

0: { 0  1  4  5  8  9  12 16 }
1: { 2  3  6  7  10 11 13 14 15 17 18 19 }

Note that while the box rearranger gives a data layout which is well balanced and well suited for the underlying io library, it had to communicate with every compute task to do so. On the other hand the subset rearranger communicated with only a portion of the compute tasks but requires more work on the part of the underlying io library to complete the operation.

Also note if every task is an IO task then the box rearranger will need to do an alltoall communication, while the subset rearranger does none. In fact using the subset rearranger with every compute task an IO task provides a measure of what you might expect the performance of the underlying IO library to be if it were used without PIO.