Large-Scale Compute-Intensive Analysis via a Combined In-Situ and Co-Scheduling Workflow Approach
CCS-7, Los Alamos National Lab, Los Alamos, NM 87545
International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’15), 2015
Large-scale simulations can produce hundreds of terabytes to petabytes of data, complicating and limiting the efficiency of work-flows. Traditionally, outputs are stored on the file system and analyzed in post-processing. With the rapidly increasing size and complexity of simulations, this approach faces an uncertain future. Trending techniques consist of performing the analysis in-situ, utilizing the same resources as the simulation, and/or off-loading subsets of the data to a compute-intensive analysis system. We introduce an analysis framework developed for HACC, a cosmological N-body code, that uses both in-situ and co-scheduling approaches for handling petabyte-scale outputs. We compare different analysis set-ups ranging from purely off-line, to purely in-situ to insitu/co-scheduling. The analysis routines are implemented using the PISTON/VTK-m framework, allowing a single implementation of an algorithm that simultaneously targets a variety of GPU, multicore, and many-core architectures.
December 12, 2015 by hgpu