FD4 Library Hands-on Training: Developing an Advection Simulation
10 November 2014, Developer School for HPC applications in Earth Sciences, ICTP, Trieste, Italy
Matthias Lieber, ZIH, TU Dresden, Germany, matthias.lieber@tu-dresden.de
10 November 2014, Developer School for HPC applications in Earth Sciences, ICTP, Trieste, Italy
Matthias Lieber, ZIH, TU Dresden, Germany, matthias.lieber@tu-dresden.de
Building FD4 should be as easy as:
% wget http://wwwpub.zih.tu-dresden.de/~mlieber/fd4/fd4-2014-11-05.tar.gz % tar xzf fd4-2014-11-05.tar.gz % cd fd4-2014-11-05 % make -j
Download the task package and unpack it in the fd4-2014-11-05 directory. If you unpack it anywhere else, the path to the FD4 library needs to be modified in the Makefiles.
% wget http://wwwpub.zih.tu-dresden.de/~mlieber/fd4/handson-ictp-2014.tar.gz % tar xzf handson-ictp-2014.tar.gz % cd handson/02_domain % make % mpirun -np 4 ./simulation
Now do the following modifications:
Declare an FD4 NetCDF Communicator and, after allocating the blocks of the domain, open a file "out.nc" and write all 3 variables to the file. Do not forget to close the file after that.
Then, after running with multiple MPI ranks, use ncview to take a look into the data:
% mpirun -np 4 ./simulation % ncview out.nc
Declare an FD4 Block Iterator and implement a loop over all blocks. Within that loop, loop over all grid cells (x and y dimension) of the block and initialize the three variables using the functions provided by the Fortran module advection:
Hints:
Run the code again and open the new NetCDF file with ncview. The pictures below show two of the variables:
Implement the time stepping loop over 2000 time steps with ghost communication. The body of the time stepping loop should contain (in this order):
Hints:
!! Simulation setup real(rkind), parameter :: dt = 0.2 ! time step size integer, parameter :: nsteps = 2000 ! number of time steps to compute integer, parameter :: output_steps = 100 ! number of steps between output [...] ! buffer array for iter_get_ghost integer :: bext(0:3) real(rkind), allocatable :: buf(:,:,:,:) ! FD4 ghost communication type(fd4_ghostcomm) :: ghostcomm(2) ! time step loop integer :: now, new ! time step indicators for varC integer :: step ! current time step
Run the code again and open the new NetCDF file with ncview. Observe how the concentration changes over time. The pictures below show the concentration at two different time steps:
First, compile and execute the program for this task. A check for mass conservation of the concentration variable has been added. The output shows, that the total mass is not constant.
step 100 mass: 56.5475 [...] step 2000 mass: 56.5523This is due to problems with boundary conditions in the grid: periodic boundary conditions are used (for simplicity), but the wind variables are not consistent at the domain boundaries, e.g. the uWind at the leftmost face has not the same value as the rightmost face (similar for vWind).
To solve the problem, zero-gradient boundary conditions should be used. This is straightforward with FD4: Before computing the next time step on a block, call fd4_boundary_zerograd_block to set the boundaries for all input variables for that time step. This routine will do nothing if the block is not at the domain boundary. Of course, you also need to disable periodic boundary conditions when calling fd4_domain_create.
Run the program again to ensure that the mass in constant. In ncview, the concentration at the end of the simulation now looks like this:
In this program the load of the advection computation is artificially increased and small concentrations are eliminated. This leads to mass loss (which we accept for the sake of this artificial example) and strong load imbalances between regions with concentration larger than zero and regions were the concentration is zero.
Insert calls to measure the computation time of a block and set the block weight. Call the FD4 load balancing routine fd4_balance_readjust at the end of each time step. Printout the last measured load balance (from the opt_stats argument of fd4_balance_readjust) at the output time steps. Run the program again and compare the run time to the original program without load balancing. Also take a look at the NetCDF output: you can see how blocks are migrated between processes in the "blocks" variable and the workload per block in the "weight" variable:
Now explore some load balancing parameters. Use fd4_balance_params (before allocating all blocks) to test the following:
Another thing to try out is to increase the number of blocks, i.e. from 64 8x8 blocks to 256 4x4 blocks. This makes load balancing more efficient for higher number of ranks (because of the finer granularity).
This program uses FD4's model coupling techniques to initialize the Concentration and to update the wind variables each time step. The wind variables are "computed" by a toy model that has a different spatial decomposition than the one FD4 uses. This artificial set-up is just to demonstrate model coupling with as few as possible source code lines. These pictures show the partitioning in FD4 and for the "wind model" using 4 MPI ranks:
The sequence within a time step now looks like this:
Take a look at the code. It contains a bug: only the windU (varU) is added to the FD4 couple context, but not windV (varV). Your task is to add windV such that the computations are correct. In ncview, vWind and Concentration look like this after 1200 steps:
This is just a demo of the adaptive block mode of FD4. The pictures show that FD4 dynamically allocates blocks only where required (i.e. variable Concentration is larger than a specified threshold):
The following things are changed in comparison to solution of Task 8:
The pictures above were created with an increased number of blocks (16x16), 8 ranks and RCB partitioning.