Task-based FMM for heterogeneous architectures

hgpu.org » Applications » Computer science » Task-based FMM for heterogeneous architectures

Task-based FMM for heterogeneous architectures

Emmanuel Agullo, Berenger Bramas, Olivier Coulaud, Eric Darve, Matthias Messner, Toru Takahashi

INRIA, Universite de Bordeaux, CNRS, UMR5800, CERFACS

hal-00974674, (7 April 2014)

@techreport{agullo:hal-00974674,

hal_id={hal-00974674},

url={http://hal.inria.fr/hal-00974674},

title={Task-based FMM for heterogeneous architectures},

author={Agullo, Emmanuel and Bramas, B{‘e}renger and Coulaud, Olivier and Darve, Eric and Messner, Matthias and Takahashi, Toru},

keywords={Fast multipole methods, graphics processing unit, heterogeneous architectures, runtime system, scheduling, pipeline},

language={Anglais},

affiliation={HiePACS – INRIA Bordeaux – Sud-Ouest, Laboratoire Bordelais de Recherche en Informatique – LaBRI , Mechanical Engineering Department, Institute for Computational and Mathematical Engineering – iCME, Department of Mechanical Science and Engineering},

pages={29},

type={Rapport de recherche},

institution={INRIA},

number={RR-8513},

collaboration={plafrim},

year={2014},

month={Apr},

pdf={http://hal.inria.fr/hal-00974674/PDF/RR-8513.pdf}

}

Download (PDF)

View

Source

2176

views

High performance FMM is crucial for the numerical simulation of many physical problems. In a previous study, we have shown that task-based FMM provides the flexibility required to process a wide spectrum of particle distributions efficiently on multicore architectures. In this paper, we now show how such an approach can be extended to fully exploit heterogeneous platforms. For that, we design highly tuned GPU versions of the two dominant operators (P2P and M2L) as well as a scheduling strategy that dynamically decides which proportion of subsequent tasks are processed on regular CPU cores and on GPU accelerators. We assess our method with the StarPU runtime system for executing the resulting task flow on an Intel X5650 Nehalem multicore processor possibly enhanced with one, two or three Nvidia Fermi M2070 or M2090 GPUs. A detailed experimental study on two 30 million particle distributions (a cube and an ellipsoid) shows that the resulting software consistently achieves high performance across architectures.

Tags: Computer science, CUDA, Fast multipole method, Heterogeneous systems, Numerical simulation, nVidia, Tesla M2070, Tesla M2090

April 9, 2014 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org