FD4: Four-Dimensional Distributed Dynamic Data structures

What is FD4?

FD4 has been originally developed for the parallelization of spectral bin cloud models and their coupling to atmospheric models. But FD4 is not restricted to this kind of application and can be used in other areas which benefit from FD4's features:

Here's a list of key words that apply to FD4: High performance computing, multiphase modeling, multiphysics, high scalability, Fortran95, MPI-2, model coupling, dynamic data structures, dynamic load balancing, Hilbert space-filling curve, ParMETIS, Vis5D, NetCDF

The development of FD4 has been funded by the DFG with the project Parallel coupling framework and advanced time integration methods for detailed cloud processes in atmospheric models (more info), a cooperation of Leibniz Institute for Tropospheric Research and ZIH, TU Dresden.

FD4 Videos

Dynamic Load Balancing of Detailed Cloud Simulations with Visualization of Hilbert SFC Partitioning and Vampir Performance Analysis.

Comparison of Graph Partitioning and SFC Partitioning for Dynamic Load Balancing of Detailed Cloud Simulations.

Vampir Message Statistics of a Hilbert SFC partitioning benchmark with massively changing communication pattern.


The FD4 manual is available as html and pdf. The manual contains:

FD4 Library Hands-on Training

The best way to learn FD4 is to use the material from the hands-on training at ICTP on November 10 2014.


FD4 is free software. It is available under the GNU GPLv3 license.
Copyright 2010-2016 Matthias Lieber, ZIH, TU Dresden, Germany

Download: fd4-2016-06-09.tar.gz

Older FD4 versions can be downloaded here.
Additional input files for 1D partitioning benchmark are here.

If you are using FD4, it would be nice if you drop me a line.

Reproducing measurements of FD4's scalable high-quality 1D partitioning

1D partitioning is the core problem for SFC-based partitioners. FD4 comes with a hierarchical 1D partitioning algorithm that is scalable like simple heuristics but also achieves nearly the optimal load balance like (serial) exact methods. FD4's algorithm runs ca. 270 times faster compared to an exact method. The algorithm is described in the FGCS paper and in much more detail in my PhD thesis.

You can compare the methods on your system with the 1D partitioning benchmark contained in the FD4 package. See README.partbench on how to run the benchmark. Additionally you need to download the input files.

New in Nov 2016: Comparison to Zoltan and new benchmark data set with ca. 10x number of tasks. This dataset contains instructions to compile and run the 1D partitioning benchmark partbench_zz. This benchmark also runs Zoltan as FD4's partitioner to enable comparing FD4's own methods to other methods.

FD4 Publications

FD4 Talks

FD4 Poster

Oct 2009
(click to enlarge poster)
Nov 2009
(click to enlarge poster)
Jan 2011
(click to enlarge poster)

Funding / Related Projects