ASPCS
 
Back to Volume
Paper: Massive Scientific Workloads - Lessons Learned From Petaflop-Scale Weather Simulations
Volume: 521, Astronomical Data Analysis Software and Systems XXVI
Page: 577
Authors: Pierfederici, F.
Abstract: Weather forecasts run at the European Centre for Medium-Range Weather Forecasts (ECMWF) are complex workloads which use tens of thousands of CPU cores from two of the most powerful supercomputers in the world (top twenty of the top 500 list). They run for potentially weeks on end and process hundreds of millions of observation datasets. Each of these forecast simulations is a heterogeneous mix of hybrid MPI-OpenMP Fortran/C/C++ numerical code surrounded by a host of Python and Shell scripts staging data in and out of databases, creating high-level products, performing sanity check on inputs and outputs, etc. When running on a HPC cluster, they each spawn tens of thousands of jobs in a very deep dependency graph. Monitoring, profiling, debugging these complex workloads and their dependency rules is a herculean task, made more difficult by the fact that the tools one can use to analyze compiled executables (e.g. darshan and Allinea MAP) lose much of their power or are completely unusable when dealing with scripts. Important issues of machine over-subscription and CPU power management are also left untackled. Lessons learned at ECMWF in the approach to whole-workload profiling of weather simulations will be presented. Their applicability to present and future astronomy processing needs will be investigated as well.
Back to Volume