11036

Posts

Nov, 29

Benchmarking Parallel Performance on Many-Core Processors

With the emergence of many-core processor architectures onto the HPC scene, concerns arise regarding the performance and productivity of numerous existing parallel-programming tools, models, and languages. As these devices begin augmenting conventional distributed cluster systems in an evolving age of heterogeneous supercomputing, proper evaluation and profiling of many-core processors must occur in order to understand […]
Nov, 28

The Future of Accelerator Programming: Abstraction, Performance or Can We Have Both?

Recently, parallel programming has become necessary in order to obtain performance gains, primarily due to power limitations. However parallel architectures differ substantially from each other, often require specialized knowledge, and typically necessitate reimplementation and fine tuning of application code. These slow tasks frequently result in situations where most of the time is spent reimplementing old […]
Nov, 28

The Use of Automated Search in Deriving Software Testing Strategies

Testing a software artefact using every one of its possible inputs would normally cost too much, and take too long, compared to the benefits of detecting faults in the software. Instead, a testing strategy is used to select a small subset of the inputs with which to test the software. The criterion used to select […]
Nov, 28

American Basket Option Pricing on a multi GPU Cluster

This article presents a multi GPU adaptation of a specific Monte Carlo and classification based method for pricing American basket options, due to Picazo [1]. The first part relates how to combine fine and coarse grained parallelization to price American basket options. In order to benefit from different GPU devices, a dynamic strategy of kernel […]
Nov, 28

Hybrid Programming using OpenSHMEM and OpenACC

With high performance systems exploiting multicore and accelerator-based architectures on a distributed shared memory system, heterogenous hybrid programming models are the natural choice to exploit all the hardware made available on these systems. Previous efforts looking into hybrid models have primarily focused on using OpenMP directives (for shared memory programming) with MPI (for inter-node programming […]
Nov, 27

Accelerated Primality Testing Using GPUs

This aim of this project was to port the FFT routines of LLRP to CUDA, which was done successfully. This success is quantified as the FFT portions of the program executing in a much shorter time than the FFTW transforms. The project shows that GPUs are certainly viable for use in numerical codes such as […]
Nov, 27

Autotuning of Pattern Runtimes for Accelerated Parallel Systems

Parallel architectures with node-level accelerators promise significant performance improvements over conventional homogeneous systems. To cope with the increased complexity of programming such systems various pattern-based programming libraries have become available. In this paper we present our work on providing autotuning capabilities for two runtime libraries that provide parallel programming patterns on state-of-the-art heterogeneous hardware. We […]
Nov, 27

Evaluating the Performance and Energy Efficiency of N-Body Codes on Multi-Core CPUs and GPUs

N-body simulations are computation-intensive ap-plications that calculate the motion of a large number of bodies under pair-wise forces. Although different versions of n-body codes have been widely used in many scientific fields, the perfor-mance and energy efficiency of various n-body codes have not been comprehensively studied, especially when they are running on newly released multi-core […]
Nov, 27

Performance Analysis of GPU-based SAR and Interferometric SAR image processing

Modern SAR and Interferometric SAR image processing make intensive usage of computer hardware resources to cope with the computational power needed to process complex images. An increasing interest in this field is being given to new approaches based on General-Purpose computing on Graphics Processing Units (GPGPU). In this paper we evaluate the performance of three […]
Nov, 27

Regression Modelling of Power Consumption for Heterogeneous Processors

This thesis is composed of two parts, that relate to both parallel and heterogeneous processing. The first describes DistCL, a distributed OpenCL framework that allows a cluster of GPUs to be programmed like a single device. It uses programmer-supplied meta-functions that associate work-items to memory. DistCL achieves speedups of up to 29x using 32 peers. […]
Nov, 26

Efficient Multi-GPU Algorithm for All-Pairs Shortest Paths

The shortest-path problem is a fundamental computer science problem with applications in diverse areas such as transportation, robotics, network routing, and VLSI design. The problem is to find paths of minimum weight between pairs of nodes in edge-weighted graphs, where the weight of a path p is defined as the sum of the weights of […]
Nov, 26

Enabling OpenCL on a Configurable, VLIW Chip-Multiprocessor

The slow-down in Moore’s law and an ever increasing computation requirements in the scientific, as well as consumer, domains has required a shift in computer system architectures and subsequent programming paradigms. In the last decade we have moved from single-core CPUs, to multicore system-on-chips (SoCs), with the use many-core accelerators becoming more commonplace. This new […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: