Posts
Nov, 27
Accelerated Primality Testing Using GPUs
This aim of this project was to port the FFT routines of LLRP to CUDA, which was done successfully. This success is quantified as the FFT portions of the program executing in a much shorter time than the FFTW transforms. The project shows that GPUs are certainly viable for use in numerical codes such as […]
Nov, 27
Autotuning of Pattern Runtimes for Accelerated Parallel Systems
Parallel architectures with node-level accelerators promise significant performance improvements over conventional homogeneous systems. To cope with the increased complexity of programming such systems various pattern-based programming libraries have become available. In this paper we present our work on providing autotuning capabilities for two runtime libraries that provide parallel programming patterns on state-of-the-art heterogeneous hardware. We […]
Nov, 27
Evaluating the Performance and Energy Efficiency of N-Body Codes on Multi-Core CPUs and GPUs
N-body simulations are computation-intensive ap-plications that calculate the motion of a large number of bodies under pair-wise forces. Although different versions of n-body codes have been widely used in many scientific fields, the perfor-mance and energy efficiency of various n-body codes have not been comprehensively studied, especially when they are running on newly released multi-core […]
Nov, 27
Performance Analysis of GPU-based SAR and Interferometric SAR image processing
Modern SAR and Interferometric SAR image processing make intensive usage of computer hardware resources to cope with the computational power needed to process complex images. An increasing interest in this field is being given to new approaches based on General-Purpose computing on Graphics Processing Units (GPGPU). In this paper we evaluate the performance of three […]
Nov, 27
Regression Modelling of Power Consumption for Heterogeneous Processors
This thesis is composed of two parts, that relate to both parallel and heterogeneous processing. The first describes DistCL, a distributed OpenCL framework that allows a cluster of GPUs to be programmed like a single device. It uses programmer-supplied meta-functions that associate work-items to memory. DistCL achieves speedups of up to 29x using 32 peers. […]
Nov, 26
Efficient Multi-GPU Algorithm for All-Pairs Shortest Paths
The shortest-path problem is a fundamental computer science problem with applications in diverse areas such as transportation, robotics, network routing, and VLSI design. The problem is to find paths of minimum weight between pairs of nodes in edge-weighted graphs, where the weight of a path p is defined as the sum of the weights of […]
Nov, 26
Enabling OpenCL on a Configurable, VLIW Chip-Multiprocessor
The slow-down in Moore’s law and an ever increasing computation requirements in the scientific, as well as consumer, domains has required a shift in computer system architectures and subsequent programming paradigms. In the last decade we have moved from single-core CPUs, to multicore system-on-chips (SoCs), with the use many-core accelerators becoming more commonplace. This new […]
Nov, 26
Evaluation and Improvement of GPU Ray Tracing with a Thread Migration Technique
Ray tracing is a computer graphics rendering technique. Different from the traditional rasterization algorithm, the ray tracing algorithm simulate the realvision process. Being able to deliver highly realistic graphics effects, it has been considered as the fundamental graphics rendering mechanism for high-end applications and is also likely to be adopted as the work-horse of future […]
Nov, 26
Exploring GPGPU Acceleration of Process-Oriented Simulations
This paper reports on our experiences of using commodity GPUs to speed-up the execution of fine-grained concurrent simulations. Starting with an existing process-oriented ‘boids’ simulation, we explore a variety of techniques aimed at improving performance, gradually refactoring the original code. Successive improvements lead to a 10-fold improvement in performance, which we believe can still be […]
Nov, 26
PG-PuReMD: A Parallel-GPU Reactive Molecular Dynamics Package
We present a parallel/GPU implementation of our open-source reactive molecular dynamics code, PG-PuReMD (Parallel GPU-Purdue Reactive Molecular Dynamics). Using a variety of innovative algorithms and optimizations, PGPuReMD achieves over 350x speedup compared to a single CPU implementation on a cluster of 36 state of the art GPUs. This is a significant development, since it enables […]
Nov, 25
Diagrammatic Determinantal Quantum Monte Carlo Calculations on GPUs
The Diagrammatic Determinantal Quantum Monte Carlo (DDQMC) algorithm [11, s. III] is used to solve quantum impurity models such as the Anderson model [13]. The topic of this dissertation is the efficient porting of an existing implementation of DDQMC to CUDA in order to use GPUs as accelerators. The main characteristics of quantum impurity models […]
Nov, 25
Investigating the use of GPUs with a Monte Carlo Astrophysical Simulation
For a given simulation, the most expensive subroutine in the astrophysics code, MOCCA (MOnte Carlo Cluster SimulAtor), has been ported to run as a kernel on a GPU (Graphics Processing Unit). The code was accelerated using the CUDA programming model, which was performed with PGI CUDA Fortran. The GPU code was run with varying problem […]