high performance computing on graphics processing units: hgpu.org

Posts

Jan, 25

Realtime scheduling using GPUs – proof of feasibility

This paper will report our evaluation to use openCL as a platform for hard realtime scheduling. Specifically, we have evaluated which types of tasks are faster on GPGPU than on CPU. We have investigated computational tasks, memory intensive tasks (especially tasks using low latency GDDR memory) and disk intensive tasks. This study is the first […]

OpenCL

Jan, 25

GPU algorithms for comparison-based sorting and for merging based on multiway selection

Sorting and merging are two important kernels which are used as subroutines in numerous algorithms, whose performance depends on the efficiency of these primitives. Databases use sort and merge primitives extensively. Computational biology, search engines, realtime rendering and geographical information systems are other fields where sorting and merging large amounts of data is indispensable. Even […]

CUDA

•

OpenCL

Jan, 25

Computational Fluid Dynamics using OpenCL – a Practical Introduction

The main aim of the Computational Fluid Dynamics (CFD) simulations is to reconstruct the reality of fluid motion and behaviour as accurately as possible in order to better understand the natural phenomena under specified conditions. Ideally, general solutions can also cover different scales and geometric configurations. Unfortunately, due to expensive algorithms, classic CFD codes most […]

OpenCL

Jan, 25

Solving Bivariate Polynomial Systems on a GPU

We present a CUDA implementation of dense multivariate polynomial arithmetic based on Fast Fourier Transforms over finite fields. Our core routine computes on the device (GPU) the subresultant chain of two polynomials with respect to a given variable. This subresultant chain is encoded by values on a FFT grid and is manipulated from the host […]

CUDA

Jan, 24

The GPU Enhanced Parallel Computing for Large Scale Data Clustering

Analyzing and clustering large scale data set is a complex problem. One explored method of solving this problem borrows from nature, imitating the flocking behavior of birds. One limitation of this method of data clustering is its complexity O(n^2). As the number of data and feature dimensions grows, it becomes increasingly difficult to generate results […]

CUDA

Jan, 24

GPApriori: GPU-Accelerated Frequent Itemset Mining

In this paper we describe GPA priori, a GPU-accelerated implementation of Frequent Item set Mining (FIM). We tested our implementation with an Nvidia Tesla T10 graphic processor and demonstrate up to 100x speedup as compared with several state-of-the-art FIM algorithms on a CPU. In order to map the Apriori algorithm onto the SIMD execution model, […]

CUDA

Jan, 24

Designing Fast LTL Model Checking Algorithms for Many-Core GPUs

Recent technological developments made various many-core hardware platforms widely accessible. These massively parallel architectures have been used to significantly accelerate many computation demanding tasks. In this paper, we show how the algorithms for LTL model checking can be redesigned in order to accelerate LTL model checking on many-core GPU platforms. Our detailed experimental evaluation demonstrates […]

CUDA

Jan, 24

Real-Time Ultrasound Biomicroscopy with Optoacoustic Arrays

Optical techniques are a promising technology to realize high frequency ultrasound arrays. High sensitivity and broad bandwidth have been demonstrated with optoacoustic sensors based on thin film etalons. A thin film etalon consists of a transparent layer (e.g. photoresist or parylene) with gold coatings on a glass substrate. One-dimensional (1-D) data acquisition is realized by […]

CUDA

Jan, 24

Real-Time Photon Mapping on GPU

This paper presents a hybrid photon-mapping approach for global illumination. It represents a significant improvement over a previously described approach, both with respect to speed and accuracy. Using OptiX for ray tracing provides a considerable improvement in the speed of ray tracing and would keep synchronization to a minimum by using texture memory to cache […]

CUDA

Jan, 24

Multipattern String Matching On A GPU

We develop GPU adaptations of the Aho-Corasick string matching algorithm for the the case when all data reside initially in the GPU memory and the results are to be left in this memory. We consider several refinements to a base GPU implementation and measure the performance gain from each refinement. Experiments conducted on an NVIDIA […]

CUDA

Jan, 24

GPGPU and Multi-Core Architectures for Computing Clustering Coefficients of Irregular Graphs

Network science makes heavy use of simulation models and calculations based upon graph-oriented data structures that are intrinsically highly irregular in nature. The key to efficient use of data-parallel and multi-core parallelism on graphical processing units (GPUs) and CPUs is often to optimise the data layout and to exploit distributed memory locality with processing elements. […]

CUDA

Jan, 24

Efficient GPU implementation of a two waves WAF method for the two-dimensional one layer Shallow Water system on structured meshes

The numerical solutions of Shallow Water Equations are useful for applications related to geophysical flows that usually take place in large computational domains and could require real time calculation. Therefore, parallel versions of accurate and efficient numerical solvers for high performance platforms are needed to be able to deal with these simulation scenarios in reasonable […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Realtime scheduling using GPUs – proof of feasibility

GPU algorithms for comparison-based sorting and for merging based on multiway selection

Computational Fluid Dynamics using OpenCL – a Practical Introduction

Solving Bivariate Polynomial Systems on a GPU

The GPU Enhanced Parallel Computing for Large Scale Data Clustering

GPApriori: GPU-Accelerated Frequent Itemset Mining

Designing Fast LTL Model Checking Algorithms for Many-Core GPUs

Real-Time Ultrasound Biomicroscopy with Optoacoustic Arrays

Real-Time Photon Mapping on GPU

Multipattern String Matching On A GPU

GPGPU and Multi-Core Architectures for Computing Clustering Coefficients of Irregular Graphs

Efficient GPU implementation of a two waves WAF method for the two-dimensional one layer Shallow Water system on structured meshes

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)