high performance computing on graphics processing units: hgpu.org

Posts

Apr, 17

Physically-based interactive schlieren flow visualization

Understanding fluid flow is a difficult problem and of increasing importance as computational fluid dynamics produces an abundance of simulation data. Experimental flow analysis has employed techniques such as shadowgraph and schlieren imaging for centuries which allow empirical observation of inhomogeneous flows. Shadowgraphs provide an intuitive way of looking at small changes in flow dynamics […]

CUDA

Apr, 17

Event-driven gate-level simulation with GP-GPUs

Logic simulation is a critical component of the design tool flow in modern hardware development efforts. It is used widely from high level descriptions down to gate level ones to validate several aspects of the design, particularly functional correctness. Despite development houses investing vast resources in the simulation task, particularly at the gate level, it […]

CUDA

Apr, 17

An Efficient Acceleration of Symmetric Key Cryptography Using General Purpose Graphics Processing Unit

Graphics Processing Units (GPU) have been the extensive research topic in recent years and have been successfully applied to general purpose applications other than computer graphical area. The nVidia CUDA programming model provides a straightforward means of describing inherently parallel computations. In this paper, we present a study of the efficiency of emerging technology in […]

CUDA

Apr, 17

Efficiently Using a CUDA-enabled GPU as Shared Resource

GPGPU is getting more and more important, but when using CUDA-enabled GPUs the special characteristics of NVIDIAs SIMT architecture have to be considered. Particularly, it is not possible to run functions concurrently, although NVIDIAs GPUs consist of many processing units. Therefore, the processing power of GPUs can not be shared among processes, and for an […]

CUDA

Apr, 17

Performance of Optical Flow Techniques on Graphics Hardware

Since graphics cards have become programmable the recent years, numerous computationally intensive algorithms have been implemented on the now called general purpose graphics processing units (GPGPUs). While the results show that GPGPUs regularly outperform CPU based implementations, the question arose how optical flow algorithms can be ported to graphics hardware. To answer the question, the […]

Apr, 17

Data handling inefficiencies between CUDA, 3D rendering, and system memory

While GPGPU programming offers faster computation of highly parallelized code, the memory bandwidth between the system and the GPU can create a bottleneck that reduces the potential gains. CUDA is a prominent GPGPU API which can transfer data to and from system code, and which can also access data used by 3D rendering APIs. In […]

CUDA

Apr, 17

A CUDA-Based Implementation of Stable Fluids in 3D with Internal and Moving Boundaries

Fluid simulation has been an active research field in computer graphics for the last 30 years. Stam’s stable fluids method, among others, is used for solving the equations that govern fluids (i.e. Navier-Stokes equations). An implementation of stable fluids in 3D using NVIDIA Compute Unified Architecture, shortly CUDA, is provided in this paper. This CUDA-based […]

CUDA

Apr, 17

Efficient JPEG2000 EBCOT Context Modeling for Massively Parallel Architectures

Embedded Block Coding with Optimal Truncation (EBCOT) is the fundamental and computationally very demanding part of the compression process of JPEG2000 image compression standard. In this paper, we present a reformulation of the context modeling of EBCOT that allows full parallelization for massively parallel architectures such as GPUs with their single instruction multiple threads architecture. […]

CUDA

Apr, 16

Exploiting Computational Resources in Distributed Heterogeneous Platforms

We have been witnessing a continuous growth of both heterogeneous computational platforms (e.g., Cell blades, or the joint use of traditional CPUs and GPUs) and multicore processor architecture; and it is still an open question how applications can fully exploit such computational potential efficiently. In this paper we introduce a run-time environment and programming framework […]

CUDA

Apr, 16

Computation of Voronoi diagrams using a graphics processing unit

A parallel algorithm to compute a discrete approximation to the Voronoi diagram is presented. The algorithm, which executes in single instruction multiple data (SIMD) mode, was implemented on a high-end graphics processing unit (GPU) using NVIDIApsilas compute unified device architecture (CUDA) development environment. The performance of the resulting code is investigated and presented, and a […]

CUDA

Apr, 16

Statistical testing of random number sequences using CUDA

Previous research in the field of statistical testing of random number sequences using Graphics Processing Units (GPU) has shown that this approach yields a significant increase in performance for a subset of the statistical tests proposed by National Institute of Standards and Technology (NIST). The present paper aims at further improvements in the performance of […]

CUDA

Apr, 16

3-SAT on CUDA: Towards a massively parallel SAT solver

This work presents the design and implementation of a massively parallel 3-SAT solver, specifically targeting random problem instances. Our approach is deterministic and features very little communication overhead and basically no load-balancing cost at all. In the context of most current parallel SAT solvers running only on a handful of cores, we implemented our solver […]

CUDA

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Posts

Physically-based interactive schlieren flow visualization

Event-driven gate-level simulation with GP-GPUs

An Efficient Acceleration of Symmetric Key Cryptography Using General Purpose Graphics Processing Unit

Efficiently Using a CUDA-enabled GPU as Shared Resource

Performance of Optical Flow Techniques on Graphics Hardware

Data handling inefficiencies between CUDA, 3D rendering, and system memory

A CUDA-Based Implementation of Stable Fluids in 3D with Internal and Moving Boundaries

Efficient JPEG2000 EBCOT Context Modeling for Massively Parallel Architectures

Exploiting Computational Resources in Distributed Heterogeneous Platforms

Computation of Voronoi diagrams using a graphics processing unit

Statistical testing of random number sequences using CUDA

3-SAT on CUDA: Towards a massively parallel SAT solver

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)