high performance computing on graphics processing units: hgpu.org

Posts

Jan, 25

Parallel Algorithm Design and Implementation of Regular/Irregular Problems: An In-depth Performance Study on Graphics Processing Units

Recently, interest in the Graphics Processing Unit (GPU) for general purpose parallel applications development and research has grown. Much of the current research on the GPU focuses on the acceleration of regular problems, as irregular problems typically do not provide the same level of performance on the hardware. We explore the potential of the GPU […]

CUDA

Jan, 25

PyCOOL – a Cosmological Object-Oriented Lattice code written in Python

There are a number of different phenomena in the early universe that have to be studied numerically with lattice simulations. This paper presents a graphics processing unit (GPU) accelerated Python program called PyCOOL that solves the evolution of scalar fields in a lattice with very precise symplectic integrators. The program has been written with the […]

CUDA

Jan, 25

Realtime scheduling using GPUs – proof of feasibility

This paper will report our evaluation to use openCL as a platform for hard realtime scheduling. Specifically, we have evaluated which types of tasks are faster on GPGPU than on CPU. We have investigated computational tasks, memory intensive tasks (especially tasks using low latency GDDR memory) and disk intensive tasks. This study is the first […]

OpenCL

Jan, 25

GPU algorithms for comparison-based sorting and for merging based on multiway selection

Sorting and merging are two important kernels which are used as subroutines in numerous algorithms, whose performance depends on the efficiency of these primitives. Databases use sort and merge primitives extensively. Computational biology, search engines, realtime rendering and geographical information systems are other fields where sorting and merging large amounts of data is indispensable. Even […]

CUDA

•

OpenCL

Jan, 25

Computational Fluid Dynamics using OpenCL – a Practical Introduction

The main aim of the Computational Fluid Dynamics (CFD) simulations is to reconstruct the reality of fluid motion and behaviour as accurately as possible in order to better understand the natural phenomena under specified conditions. Ideally, general solutions can also cover different scales and geometric configurations. Unfortunately, due to expensive algorithms, classic CFD codes most […]

OpenCL

Jan, 25

Solving Bivariate Polynomial Systems on a GPU

We present a CUDA implementation of dense multivariate polynomial arithmetic based on Fast Fourier Transforms over finite fields. Our core routine computes on the device (GPU) the subresultant chain of two polynomials with respect to a given variable. This subresultant chain is encoded by values on a FFT grid and is manipulated from the host […]

CUDA

Jan, 24

The GPU Enhanced Parallel Computing for Large Scale Data Clustering

Analyzing and clustering large scale data set is a complex problem. One explored method of solving this problem borrows from nature, imitating the flocking behavior of birds. One limitation of this method of data clustering is its complexity O(n^2). As the number of data and feature dimensions grows, it becomes increasingly difficult to generate results […]

CUDA

Jan, 24

GPApriori: GPU-Accelerated Frequent Itemset Mining

In this paper we describe GPA priori, a GPU-accelerated implementation of Frequent Item set Mining (FIM). We tested our implementation with an Nvidia Tesla T10 graphic processor and demonstrate up to 100x speedup as compared with several state-of-the-art FIM algorithms on a CPU. In order to map the Apriori algorithm onto the SIMD execution model, […]

CUDA

Jan, 24

Designing Fast LTL Model Checking Algorithms for Many-Core GPUs

Recent technological developments made various many-core hardware platforms widely accessible. These massively parallel architectures have been used to significantly accelerate many computation demanding tasks. In this paper, we show how the algorithms for LTL model checking can be redesigned in order to accelerate LTL model checking on many-core GPU platforms. Our detailed experimental evaluation demonstrates […]

CUDA

Jan, 24

Real-Time Ultrasound Biomicroscopy with Optoacoustic Arrays

Optical techniques are a promising technology to realize high frequency ultrasound arrays. High sensitivity and broad bandwidth have been demonstrated with optoacoustic sensors based on thin film etalons. A thin film etalon consists of a transparent layer (e.g. photoresist or parylene) with gold coatings on a glass substrate. One-dimensional (1-D) data acquisition is realized by […]

CUDA

Jan, 24

Real-Time Photon Mapping on GPU

This paper presents a hybrid photon-mapping approach for global illumination. It represents a significant improvement over a previously described approach, both with respect to speed and accuracy. Using OptiX for ray tracing provides a considerable improvement in the speed of ray tracing and would keep synchronization to a minimum by using texture memory to cache […]

CUDA

Jan, 24

Multipattern String Matching On A GPU

We develop GPU adaptations of the Aho-Corasick string matching algorithm for the the case when all data reside initially in the GPU memory and the results are to be left in this memory. We consider several refinements to a base GPU implementation and measure the performance gain from each refinement. Experiments conducted on an NVIDIA […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Parallel Algorithm Design and Implementation of Regular/Irregular Problems: An In-depth Performance Study on Graphics Processing Units

PyCOOL – a Cosmological Object-Oriented Lattice code written in Python

Realtime scheduling using GPUs – proof of feasibility

GPU algorithms for comparison-based sorting and for merging based on multiway selection

Computational Fluid Dynamics using OpenCL – a Practical Introduction

Solving Bivariate Polynomial Systems on a GPU

The GPU Enhanced Parallel Computing for Large Scale Data Clustering

GPApriori: GPU-Accelerated Frequent Itemset Mining

Designing Fast LTL Model Checking Algorithms for Many-Core GPUs

Real-Time Ultrasound Biomicroscopy with Optoacoustic Arrays

Real-Time Photon Mapping on GPU

Multipattern String Matching On A GPU

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)