high performance computing on graphics processing units: hgpu.org

Posts

Jul, 12

Scalable Techniques for Scheduling and Mapping DSP Applications onto Embedded Multiprocessor Platforms

A variety of multiprocessor architectures has proliferated even for off-the-shelf computing platforms. To make use of these platforms, traditional implementation frameworks focus on implementing Digital Signal Processing (DSP) applications using special platform features to achieve high performance. However, due to the fast evolution of the underlying architectures, solution redevelopment is error prone and re-usability of […]

CUDA

Jul, 10

Evaluating different Java bindings for OpenCL

The traditional CPU is able to run only a few complex threads concurrently. By contrast, a GPU (Graphics Processing Unit) allows a concurrent execution of hundreds or thousands of simpler threads. The GPU was originally designed for a computer graphics, but nowadays it is being used for generalpurpose computation using a GPGPU (General Purpose GPU) […]

OpenCL

Jul, 10

Modelling sea water intrusion in coastal aquifers using heterogeneous computing

The objective of this PhD research program is to investigate numerical methods for simulating variably-saturated flow and sea water intrusion in coastal aquifers in a high-performance computing environment. The work is divided into three overlapping tasks: to develop an accurate and stable finite volume discretisation and numerical solution strategy for the variably-saturated flow and salt […]

CUDA

Jul, 10

Meshfree/GFEM in hardware-efficiency prospective

A fundamental trend of processor architecture evolving towards exaflops is fast increasing floating point performance (so-called "free" flops) accompanied by much slowly increasing memory and network bandwidth. In order to fully enjoy the "free" flops, a numerical algorithm of PDEs should request more flops per byte or increase arithmetic intensity. A meshfree/GFEM approximation can be […]

CUDA

Jul, 10

DistCL: A Framework for the Distributed Execution of OpenCL Kernels

GPUs are used to speed up many scientific computations; however, to use several networked GPUs concurrently, the programmer must explicitly partition work and transmit data between devices. We propose DistCL, a novel framework that distributes the execution of OpenCL kernels across a GPU cluster. DistCL makes multiple distributed compute devices appear to be a single […]

OpenCL

Jul, 10

Exploiting Data Parallelism in the yConvex Hypergraph Algorithm for Image Representation using GPGPUs

To define and identify a region-of-interest (ROI) in a digital image, the shape descriptor of the ROI has to be described in terms of its boundary characteristics. To address the generic issues of contour tracking, the yConvex Hypergraph (yCHG) model was proposed by Kanna et al [1]. In this work, we propose a parallel approach […]

CUDA

Jul, 9

Hybrid Scheduling for Event-driven Simulation over Heterogeneous Computers

In this work we propose a new scheduling approach designed from scratch to maximize heterogeneous computers usage and the event processing flow at the same time. The scheduler is built based on three fundamental concepts which introduces a new vision of discrete event simulation: 1) events are clustered according to their potential time parallelism on […]

CUDA

Jul, 9

Parallelization Strategies for Local Search Algorithms on Graphics Processing Units

The purpose of this paper is to propose effective parallelization strategies for Local Search algorithms on Graphics Processing Units (GPU). We consider the distribution of the 3-opt neighborhood structure embedded in the Iterated Local Search framework. Three resulting approaches are evaluated and compared on both speedup and solution quality on a state-of-the-art Fermi GPU architecture. […]

CUDA

Jul, 9

Discontinuous Galerkin Methods on Graphics Processing Units for Nonlinear Hyperbolic Conservation Laws

We present an implementation of the discontinuous Galerkin (DG) method for hyperbolic conservation laws in two dimensions on graphics processing units (GPUs) using NVIDIA’s Compute Unified Device Architecture (CUDA). Both flexible and highly accurate, DG methods accommodate parallel architectures well, as their discontinuous nature produces entirely element-local approximations. High performance scientific computing suits GPUs well, […]

CUDA

Jul, 9

Computation of the Isogeometric Analysis Stiffness Matrix on GPU

Due to high regularity across mesh elements of isogeometric analysis, this new method achieves higher accuracy per degree of freedom and improved spectrum properties, among others, compared to finite element analysis. However, this inherent feature of isogeometric analysis reduces the sparsity pattern of stiffness matrix and requires more elaborate numerical integration schemes for its computation. […]

CUDA

Jul, 9

Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture

Query co-processing on graphics processors (GPUs) has become an effective means to improve the performance of main memory databases. However, the relatively low bandwidth and high latency of the PCI-e bus are usually bottleneck issues for co-processing. Recently, coupled CPU-GPU architectures have received a lot of attention, e.g. AMD APUs with the CPU and the […]

OpenCL

Jul, 8

A Smart GPU Implementation of an Elliptic Kernel for an Ocean Global Circulation Model

In this paper, the preconditioning technique of an elliptic Laplace problem in a global circulation ocean model is analyzed. We suggest an inverse preconditioning technique in order to efficiently compute the numerical solution of the elliptic kernel. Moreover, we show how the convergence rate and the performance of the solver are strictly linked to the […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Scalable Techniques for Scheduling and Mapping DSP Applications onto Embedded Multiprocessor Platforms

Evaluating different Java bindings for OpenCL

Modelling sea water intrusion in coastal aquifers using heterogeneous computing

Meshfree/GFEM in hardware-efficiency prospective

DistCL: A Framework for the Distributed Execution of OpenCL Kernels

Exploiting Data Parallelism in the yConvex Hypergraph Algorithm for Image Representation using GPGPUs

Hybrid Scheduling for Event-driven Simulation over Heterogeneous Computers

Parallelization Strategies for Local Search Algorithms on Graphics Processing Units

Discontinuous Galerkin Methods on Graphics Processing Units for Nonlinear Hyperbolic Conservation Laws

Computation of the Isogeometric Analysis Stiffness Matrix on GPU

Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture

A Smart GPU Implementation of an Elliptic Kernel for an Ocean Global Circulation Model

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)