11242

Posts

Nov, 10

Towards a Portable and Future-proof Particle-in-Cell Plasma Physics Code

We present the first reported OpenCL implementation of EPOCH3D, an extensible particle-in-cell plasma physics code developed at the University of Warwick. We document the challenges and successes of this porting effort, and compare the performance of our implementation executing on a wide variety of hardware from multiple vendors. The focus of our work is on […]
Nov, 6

Computer Graphics: From Pixels to Programmable Graphics Hardware

Computer Graphics: From Pixels to Programmable Graphics Hardware explores all major areas of modern computer graphics, starting from basic mathematics and algorithms and concluding with OpenGL and real-time graphics. It gives students a firm foundation in today’s high-performance graphics. UP-TO-DATE TECHNIQUES, ALGORITHMS, AND API: The book includes mathematical background on vectors and matrices as well […]
Oct, 29

Extension of the SkePU Skeleton Programming Framework for Multi-core CPU and Multi-GPU Systems for MPI-based Clusters

SkePU (Skeleton Programming Framework for Multi-core CPU and Multi-GPU Systems) is a parallel computing framework developed by Johan Enmyren and Christoph Kessler at Linkopings Universitet. This C++ template library provides a simple and unified interface for specifying data-parallel computations with the help of skeletons and is targeted to multiple backends e.g. for a sequential CPU, […]
Oct, 27

MPI Parallelization of GPU-based Lattice Boltzmann Simulations

In this thesis, a MPI parallelized LBM code for a Multi-GPU platform has been designed and implemented. The primary goal of the thesis is research on efficient and scalable Multi-GPU LBM code, which exploits advanced features of the modern GPUs, to adopt optimization techniques like overlapping of work and communication in heterogeneous CPU-GPU clusters. In […]
Oct, 25

Online Performance Projection for Clusters with Heterogeneous GPUs

We present a fully automated approach to project the relative performance of an OpenCL program over different GPUs. Performance projections can be made within a small amount of time, and the projection overhead stays relatively constant with the input data size. As a result, the technique can help runtime tools make dynamic decisions about which […]
Oct, 21

Concurrent kernel execution on Graphic Processing Units

General Purpose Graphic Processing Unit (GPGPU) are now used in high performance computing (HPC) for their massively parallel computing aspect and capabilities. Those devices integrate hundreds of computing unit (computing core). Usually, such a level of parallelism is used to solve simulation problems (heat transfer, …) because of the numerical representation of simulated environment (matrices). […]
Oct, 21

Energy Efficiency Studies of Mont Blanc Applications

In this thesis, the performance and energy efficiency of four different implementations of matrix multiplication, written in OmpSs and OpenCL, is tested and evaluated. The benchmarking is done using an Intel Ivy Bridge Core i7 3770K. The results are evaluated and discussed with regards to different optimization configurations, like vectorization and multi-threading. Energy measurements are […]
Oct, 18

OpenACC-based Snow Simulation

In recent years, the GPU platform has risen in popularity in high performance computing due to its cost effectiveness and high computing power offered through its many parallel cores. The GPUs computing power can be harnessed using the low-level GPGPU programming APIs CUDA and OpenCL. While both CUDA and OpenCL gives the programmer fine-grained control […]
Oct, 13

Contributions to parallel stochastic simulation: Application of good software engineering practices to the distribution of pseudorandom streams in hybrid Monte-Carlo simulations

The race to computing power increases every day in the simulation community. A few years ago, scientists have started to harness the computing power of Graphics Processing Units (GPUs) to parallelize their simulations. As with any parallel architecture, not only the simulation model implementation has to be ported to the new parallel platform, but all […]
Oct, 10

A Parallel Intermediate Representation for Embedded Languages

This thesis presents a parallel intermediate representation for embedded languages called PIRE, and its incorporation into the Feldspar language. The original Feldspar backend translates the parallel loops of Feldspar to ordinary for loops, meaning that they are not actually parallel in the generated code. We create an alternate backend for the Feldspar project, where the […]
Oct, 5

Speculative Execution of Parallel Programs with Precise Exception Semantics on GPUs

General purpose computing on GPUs (GPGPU) can enable significant performance and energy improvements for certain classes of applications. However, current GPGPU programming models, such as CUDA and OpenCL, are only accessible by systems experts through low-level C/C++ APIs. In contrast, large numbers of programmers use high-level languages, such as Java, due to their productivity advantages […]
Oct, 5

Parametric GPU Code Generation for Affine Loop Programs

Partitioning a parallel computation into finitely sized chunks for effective mapping onto a parallel machine is a critical concern for source-to-source compilation. In the context of OpenCL and CUDA, this translates to the definition of a uniform hyper-rectangular partitioning of the parallel execution space where each partition is subject to a fine-grained distribution of resources […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: