9370

Improving Synchronization and Data Access in Parallel Programming Models

Ettore Speziale
Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano
Politecnico di Milano, 2013

@phdthesis{speziale2013improving,

   title={Improving synchronization and data access in parallel programming models},

   author={Speziale, Ettore},

   year={2013},

   school={Italy}

}

Download Download (PDF)   View View   Source Source   

826

views

Today, parallel architectures are the main vector for exploiting available die area. The shift from architectures tuned for sequential programming models to ones optimized for parallel processing follows from the inability of further enhance sequential performance due to power and memory walls. On the other hand, efficient exploitation of parallel computing units looks a hard task. Indeed, to get performance improvements it is necessary to carefully tune applications, as proven by years of High Performance Computing using MPI. To lower the burden of parallel programming, parallel programming models expose a simplified view of the hardware, by relying on abstract parallel constructs, such as parallel loops or tasks. Mapping of those constructs on parallel processing units is achieved by a mix of optimizing compilers and run-time techniques. However, due to the availability of an huge number of very different parallel architectures, hiding low-level details often prevents performance to be comparable with the one of hand-tuned code. This dissertation aims at analyzing inefficiencies related to the usage of parallel computing units, and to optimize them from the runtime perspective. In particular, we analyze the optimization of reduction computations when performed together with barrier synchronizations. Moreover, we show how runtime techniques can exploit affinity between data and computations to limit as much as possible the performance penalty hidden in NUMA architectures, both in the OpenMP and MapReduce settings. We then observe how a lightweight JIT compilation approach could enable better exploitation of parallel architectures, and lastly we analyze the resilience to faults induction of synchronization primitives, a basic building block of all parallel programs.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: