Resource Centered Computing delivering high parallel performance

hgpu.org » Applications » Computer science » Resource Centered Computing delivering high parallel performance

Resource Centered Computing delivering high parallel performance

Jens Gustedt, Stephane Vialle, Patrick Mercier

ALGORILLE (INRIA Nancy – Grand Est / LORIA), INRIA – CNRS: UMR7503 – Universite de Lorraine

hal-00921128, (19 December 2013)

@techreport{gustedt:hal-00921128,

hal_id={hal-00921128},

url={http://hal.inria.fr/hal-00921128},

title={Resource Centered Computing delivering high parallel performance},

author={Gustedt, Jens and Vialle, St{‘e}phane and Mercier, Patrick},

keywords={resource centered computing; read-write locks; clusters; accelerators; GPU; experiments; performance},

language={Anglais},

affiliation={ALGORILLE – INRIA Nancy – Grand Est / LORIA, Laboratoire des sciences de l’ing{‘e}nieur, de l’informatique et de l’imagerie – ICube, Georgia Tech – CNRS – UMI2958, SUPELEC-Campus Metz},

type={Rapport de recherche},

institution={INRIA},

number={RR-8433},

collaboration={Aladin, Grid5000, MULTICORE},

year={2013},

month={Dec},

pdf={http://hal.inria.fr/hal-00921128/PDF/RR-8433.pdf}

}

Download (PDF)

View

Source

2231

views

Modern parallel programming requires a combination of different paradigms, expertise and tuning, that correspond to the different levels in today’s hierarchical architectures. To cope with the inherent difficulty, ORWL (ordered read-write locks) presents a new paradigm and toolbox centered around local or remote resources, such as data, processors or accelerators. ORWL programmers describe their computation in terms of access to these resources during critical sections. Exclusive or shared access to the resources is granted through FIFOs and with read-write semantic. ORWL partially replaces a classical runtime and offers a new API for resource centric parallel programming. We successfully ran an ORWL benchmark application on different parallel architectures (a multicore CPU cluster, a NUMA machine, a CPU+GPU cluster). When processing large data we achieved scalability and performance similar to a reference code built on top of MPI+OpenMP+CUDA. The integration of optimized kernels of scientific computing libraries (ATLAS and cuBLAS) has been almost effortless, and we were able to increase performance using both CPU and GPU cores on our hybrid hierarchical cluster simultaneously. We aim to make ORWL a new easy-to-use and efficient programming model and toolbox for parallel developers.

Tags: Computer science, CUBLAS, CUDA, GPU cluster, Hierarchical clustering, MPI, nVidia, nVidia GeForce GTX 580

December 22, 2013 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org