5329

Posts

Aug, 22

Load Balancing versus Occupancy Maximization on Graphics Processing Units: The Generalized Hough Transform as a Case Study

Programs developed under the Compute Unified Device Architecture obtain the highest performance rate, when the exploitation of hardware resources on a Graphics Processing Unit (GPU) is maximized. In order to achieve this purpose, load balancing among threads and a high value of processor occupancy, i.e. the ratio of active threads, are indispensable. However, in certain […]
Aug, 22

Top ten ways to make formal methods for HPC practical

Almost all fundamental advances in science and engineering crucially depend on the availability of extremely capable high performance computing (HPC) systems. Future HPC systems will increasingly be based on heterogeneous multi-core CPUs, and their programming will involve multiple concurrency models, with the message passing interface (MPI) serving as the dominant model for many years. These […]
Aug, 22

The VRE volume rendering engine

We present the extendable volume rendering engine VRE which provides an open and flexible environment for both experimental and production level implementation of a wide range of volume visualisation algorithms, including various CPU and GPU based ones. We identify parts of renderer functionality suitable for isolation in logical units and propose various types of plugins. […]
Aug, 22

Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations

A trend that has materialized, and has given rise to much attention, is of the increasingly heterogeneous computing platforms. Presently, it has become very common for a desktop or a notebook computer to come equipped with both a multi-core CPU and a GPU. Capitalizing on the maximum computational power of such architectures (i.e., by simultaneously […]
Aug, 22

Reusable software components for accelerator-based clusters

The emerging accelerator-based heterogeneous clusters, comprising specialized processors such as the IBM Cell and GPUs, have exhibited excellent price to performance ratio as well as high energy-efficiency. However, developing and maintaining software for such systems is fraught with challenges, especially for modern high-performance computing (HPC) applications that can benefit the most from leveraging accelerators. If […]
Aug, 22

Improving programmability of heterogeneous many-core systems via explicit platform descriptions

In this paper we present ongoing work towards a programming framework for heterogeneous hardware- and software environments. Our framework aims at improving programmability and portability for heterogeneous many-core systems via a Platform Description Language (PDL) for expressing architectural patterns and platform information. We developed a prototypical code generator that takes as input an annotated serial […]
Aug, 22

Accelerating Haskell array codes with multicore GPUs

Current GPUs are massively parallel multicore processors optimised for workloads with a large degree of SIMD parallelism. Good performance requires highly idiomatic programs, whose development is work intensive and requires expert knowledge. To raise the level of abstraction, we propose a domain-specific high-level language of array computations that captures appropriate idioms in the form of […]
Aug, 22

A programming model for GPU-based parallel computing with scalability and abstraction

In this paper, we present a multi-level programming model for recent GPU-based high performance computing systems. Involving cooperative stream threads and symmetric multiprocessing threads our model gives a computational framework that scales through multi-GPU environments to GPU-cluster systems. Instead of hiding the execution environment from the programmer using compiler extensions or metaprogramming techniques we aim […]
Aug, 21

Compiling Python to a hybrid execution environment

A new compilation framework enables the execution of numerical-intensive applications, written in Python, on a hybrid execution environment formed by a CPU and a GPU. This compiler automatically computes the set of memory locations that need to be transferred to the GPU, and produces the correct mapping between the CPU and the GPU address spaces. […]
Aug, 21

A declarative API for particle systems

Recent trends in computer-graphics APIs and hardware have made it practical to use high-level functional languages for real-time graphics applications. Thus we have the opportunity to develop new approaches to computer graphics that take advantage of the high-level features of functional languages. This paper describes one such project that uses the techniques of functional programming […]
Aug, 21

Software architecture and system validation of an open, unified model for accelerated multicore computing

For systems that use hardware accelerators to combine multicore and multiprocess technology with libraries and computational kernels, the drawbacks are the complexity of the programming model and the corresponding verification of the software and validation of the system performance capabilities. In this paper, we describe a software approach to utilizing the compute power of the […]
Aug, 21

Mind the gap!: bridging the dichotomy of design and implementation

This paper presents a revamping of a sparse linear algebra design pattern, targeting parallelization within scientific and engineering applications. A proof of concept implementation is developed to compare actual software practices and optimizations with those described in the original design pattern. The case study reveals that the design pattern did not tightly coincide with the […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: