19424

Posts

Jan, 12

Static Analysis and Dynamic Adaptation of Parallelism

Scientific applications have an increasing need of resources and many grand scientific challenges require exascale compute capabilities to be addressed. One major concern to achieve exascale is programmability. New automatic methods are required to fill the gap between developers of scientific applications and HPC experts. In addition, as scientific applications are becoming more and more […]
Dec, 29

Automatic Performance Optimisation of Parallel Programs for GPUs via Rewrite Rules

Graphics Processing Units (GPUs) are now commonplace in computing systems and are the most successful parallel accelerators. Their performance is orders of magnitude higher than traditional Central Processing Units (CPUs) making them attractive for many application domains with high computational demands. However, achieving their full performance potential is extremely hard, even for experienced programmers, as […]
Dec, 29

Accelerating Molecular Docking by Parallelized Heterogeneous Computing – A Case Study of Performance, Quality of Results, and Energy-Efficiency using CPUs, GPUs, and FPGAs

Molecular Docking (MD) is a key tool in computer-aided drug design that aims to predict the binding pose between a small molecule and a macromolecular target. At its core, MD calculates the strength of possible binding poses, and searches for the energetically-stronger ones among those generated during simulation. Automatic Docking (AutoDock) is a widely-used MD […]
Dec, 8

GPU Computing with Python: Performance, Energy Efficiency and Usability

In this work, we examine the performance, energy efficiency and usability when using Python for developing HPC codes running on the GPU. We investigate the portability of performance and energy efficiency between CUDA and OpenCL; between GPU generations; and between low-end, mid-range and high-end GPUs. Our findings show that the impact of using Python is […]
Nov, 10

Framework for Parallel Kernels Auto-tuning

The result of this thesis is a framework for auto-tuning of parallel kernels which are written in either OpenCL or CUDA language. The framework includes advanced functionality such as support for composite kernels and online auto-tuning. The thesis describes API and internal structure of the framework and presents several examples of its utilization for kernel […]
Oct, 13

Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

Convolutional Neural Networks (CNN) are becoming a common presence in many applications and services, due to their superior recognition accuracy. They are increasingly being used on mobile devices, many times just by porting large models designed for server space, although several model compression techniques have been considered. One model compression technique intended to reduce computations […]
Oct, 13

hlslib: Software Engineering for Hardware Design

High-level synthesis (HLS) tools have brought FPGA development into the mainstream, by allowing programmers to design architectures using familiar languages such as C, C++, and OpenCL. While the move to these languages has brought significant benefits, many aspects of traditional software engineering are still unsupported, or not exploited by developers in practice. Furthermore, designing reconfigurable […]
Sep, 29

Futhark Vulkan Backend

This paper describes the effort, challenges, and limitations involved in the implementation of a Futhark compiler variant using the Vulkan API version 1.1 for compiling Futhark programs targeting GPUs. Compared to the existing OpenCL backend with the same purpose, the more modern Vulkan API could offer some performance benefits and may extend the scope of […]
Sep, 29

Heterogeneous Resource-Elastic Management for FPGAs: Concepts, Theory and Implementation

Despite deployment of FPGAs at the edge and cloud data centers due to their performance and energy advantage, FPGA runtime systems commonly tend to support only one-application-at-a-time and cannot adapt to dynamic workloads with reasonable response times. Therefore, this paper proposes the concepts and theory of resource elasticity for FPGA systems to allow a task […]
Sep, 15

Characterizing and Predicting Scientific Workloads for Heterogeneous Computing Systems

The next-generation of supercomputers will feature a diverse mix of accelerator devices. The increase in heterogeneity is explained by the nature of supercomputing workloads – certain devices offer acceleration, or a shorter time to completion, for particular application programs. Certain characteristics of these programs are fixed and impose fundamental limitations on the workloads regardless of […]
Sep, 8

Compilers for Portable Programming of Heterogeneous Parallel & Approximate Computing Systems

Programming heterogeneous systems such as the System-on-chip (SoC) processors in modern mobile devices can be extremely complex because a single system may include multiple different parallelism models, instruction sets, memory hierarchies, and systems use different combinations of these features. This is further complicated by software and hardware approximate computing optimizations. Different compute units on an […]
Aug, 18

High Performance Computing via High Level Synthesis

As more and more powerful integrated circuits are appearing on the market, more and more applications, with very different requirements and workloads, are making use of the available computing power. This thesis is in particular devoted to High-Performance Computing applications, where those trends are carried to the extreme. In this domain, the primary aspects to […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: