high performance computing on graphics processing units: hgpu.org

Posts

Mar, 13

Comparing GPU and CPU in OLAP Cubes Creation

GPGPU (General Purpose Graphical Processing Unit) programming is receiving more attention recently because of enormous computations speed up offered by this technology. GPGPU is applied in many branches of science and industry not excluding databases, even if this is not the primary field of expected benefits. In this paper a typical time consuming database algorithm, […]

CUDA

Mar, 13

Obsidian: GPU Programming in Haskell

Obsidian is a language for data-parallel programming embedded in Haskell. As the Obsidian programs are run, C code is generated. This C code can be compiled for an NVIDIA 8800 series GPU (Graphics Processing Unit), or for other high-end NVIDIA GPUs. The idea is that the style of programming used in Lava for structural hardware […]

CUDA

Mar, 13

Obsidian: GPU Kernel Programming in Haskell (thesis)

Graphics Processing Units (GPUs) are evolving into powerful general purpose computing platforms. At first, GPU performance was driven by the requirements of 3D graphics computer games. To fit this workload, a GPU is a many-core processor suitable for the data-parallel programming paradigm. Today, GPUs come with hundreds of processing elements and a theoretical single precision […]

CUDA

Mar, 13

High Quality Elliptical Texture Filtering on GPU

The quality of the available hardware texture filtering, even on state of the art graphics hardware, suffers from several aliasing artifacts, in both spatial and temporal domain. Those artifacts are mostly evident in extreme conditions, such as grazing viewing angles, highly warped texture coordinates, or extreme perspective and become especially annoying when animation is involved. […]

OpenGL

Mar, 13

GPU-based Multilevel Clustering

The processing power of parallel co-processors like the Graphics Processing Unit (GPU) are dramatically increasing. However, up until now only a few approaches have been presented to utilize this kind of hardware for mesh clustering purposes. In this paper we introduce a Multilevel clustering technique designed as a parallel algorithm and solely implemented on the […]

Mar, 13

Real-Time Image Segmentation on a GPU

Efficient segmentation of color images is important for many applications in computer vision. Non-parametric solutions are required in situations where little or no prior knowledge about the data is available. In this paper, we present a novel parallel image segmentation algorithm which segments images in real-time in a non-parametric way. The algorithm finds the equilibrium […]

CUDA

Mar, 13

A Quantitative Performance Analysis Model for GPU Architectures

We develop a microbenchmark-based performance model for NVIDIA GeForce 200-series GPUs. Our model identifies GPU program bottlenecks and quantitatively analyzes performance, and thus allows programmers and architects to predict the benefits of potential program optimizations and architectural improvements. In particular, we use a microbenchmark-based approach to develop a throughput model for three major components of […]

CUDA

Mar, 13

Interactive Volume Rendering Aurora on the GPU

We present a combination of techniques to render the aurora borealis in real time on a modern graphics processing unit (GPU). Unlike the general 3D volume rendering problem, an auroral display is emissive and can be factored into a height-dependent energy deposition function, and a 2D electron flux map. We also present a GPU-friendly atmosphere […]

OpenGL

Mar, 13

GPU Objects

Points, lines, and polygons have been the fundamental primitives in graphics. Graphics hardware is optimized to handle them in a pipeline. Other objects are converted to these primitives before rendering. Programmable GPUs have made it possible to introduce a wide class of computations on each vertex and on each fragment. In this paper, we outline […]

OpenGL

Mar, 13

GPU for CAD

Due to the explosive market growth in computer gaming, the underlying technology of Graphical Processor Units is also exploding in terms of new capabilities and raw processing power. While the primary target of the growth in GPU capabilities is computer games, computer-aided design applications stand to gain substantial benefits as well. This paper outlines the […]

Mar, 12

Improving the Efficiency of GPU Clusters

If you perceive more than a little excitement around the topic of Graphic Processing Units (GPUs) in High-Performance Computing (HPC), it’s for pretty good reason. HPC is all about performance after all, and it’s not every day that a new technology promises an order of magnitude boost in processing power. A variety of new GPU […]

Mar, 12

Efficient Spatial Binning on the GPU

We present a new technique for sorting data into spatial bins or buckets using a graphics processing unit (GPU). Our method takes unsorted point data as input and scatters the points, in sorted order, into a set of bins. This is a key operation in the construction of spatial data structures, which are essential for […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Comparing GPU and CPU in OLAP Cubes Creation

Obsidian: GPU Programming in Haskell

Obsidian: GPU Kernel Programming in Haskell (thesis)

High Quality Elliptical Texture Filtering on GPU

GPU-based Multilevel Clustering

Real-Time Image Segmentation on a GPU

A Quantitative Performance Analysis Model for GPU Architectures

Interactive Volume Rendering Aurora on the GPU

GPU Objects

GPU for CAD

Improving the Efficiency of GPU Clusters

Efficient Spatial Binning on the GPU

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)