high performance computing on graphics processing units: hgpu.org

Posts

May, 22

Secret Key Cryptography Using Graphics Cards

One frequently cited reason for the lack of wide deployment of cryptographic protocols is the (perceived) poor performance of the algorithms they employ and their impact on the rest of the system. Although high-performance dedicated cryptographic accelerator cards have been commercially available for some time, market penetration remains low. We take a different approach, seeking […]

OpenGL

May, 22

Realistic rendering of surface appearance using GPU

Summary form only given. We present techniques for realistic modeling surface details and efficient rendering of the associated visual effects using programmable GPUs. An important topic in rendering surface appearance is the treatment of mesostructures, which are responsible for fine-scale shadowing, occlusion, inter-reflectance, and silhouettes. One way to model surface mesostructure is by using the […]

May, 22

Scalable packet classification via GPU metaprogramming

Packet classification has been a fundamental processing pattern of modern networking devices. Today’s high-performance routers use specialized hardware for packet classification, but such solutions suffer from prohibitive cost, high power consumption, and poor extensibility. On the other hand, software-based routers offer the best flexibility, but could only deliver limited performance (

May, 22

GPU accelerated fuzzy connected image segmentation by using CUDA

Image segmentation techniques using fuzzy connectedness principles have shown their effectiveness in segmenting a variety of objects in several large applications in recent years. However, one problem of these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays commodity graphics hardware provides high parallel computing power. In this paper, we present […]

CUDA

May, 22

Improving the Speed of Virtual Rear Projection: A GPU-Centric Architecture

Projection is the only viable way to produce very large displays. Rear projection of large-scale upright displays is often preferred over front projection because of the lack of shadows that occlude the projected image. However, rear projection is not always a feasible option for space and cost reasons. Recent research suggests that many of the […]

May, 22

Improving performance for emergent environments parameter tuning and simulation in games using GPU

Computer games that handle realistic environments are becoming more popular in the game market. Games that make use of natural environments such as the spreading of fire or the flow of water need to be very carefully designed. In order to produce a desired effect of fire or water, a designer needs to try and […]

May, 22

A GPU-based closed frequent itemsets mining algorithm over stream

Closed frequent itemsets are one of several condensed representations of frequent itemsets, which store all the information of frequent itemsets using less space, thus being more suitable for stream mining. This paper considers a problem that to the best of our knowledge has not been addressed, namely, how to use GPU to mine closed frequent […]

May, 22

Optimized GPU Framework for Ultrasound Color Flow Imaging

A GPU framework for ultrasound color flow imaging (CFI) based on auto-correlation is presented. The parallel CFI processing framework implementation is mainly based on CUDA performance features, such as the memory selection strategy, applicable thread structure and high-throughput bandwidth. Parallel convolution algorithm and multi-channel championship algorithm are proposed. This CFI method achieves a frame rate […]

CUDA

May, 21

A trigger system based on Graphics Processing Unit (GPU)

We discuss the possible use of GPUs (Graphics Processing Unit) in the all-digital trigger and data acquisition (TDAQ) chain of the NA62 experiment at CERN. The exponentially growing interest in using GPUs for general purpose applications is based on the impressive performances achieved (peak performance already exceeding the Teraflop/s), on the high bandwidth to memory […]

May, 21

An architecture design of GPU-accelerated VoD streaming servers with network coding

Graphics processing unit (GPU) has evolved into a general-purpose computing platform. Inspired by the GPU technology advantage, this paper concerns the design and performance evaluation of practical GPU-accelerated server architecture for Video-on-Demand (VoD) services with network coding. Following the proposal of an optimized network coding algorithm based on parallel threads on GPU, a GPU-Accelerated Server […]

May, 21

A comprehensive analysis and parallelization of an image retrieval algorithm

The prevalence of the Internet and cloud computing has made multimedia data, such as image data and video data, become major data types in our daily life. For example, many data-intensive applications, such as health care and video recommendation, involve collecting, indexing and retrieving tera-scale multimedia data every day. With such a huge amount of […]

CUDA

May, 21

Analyzing throughput of GPGPUs exploiting within-die core-to-core frequency variation

The state-of-the-art general-purpose graphic processing units (GPGPUs) can offer very high computational throughput for general-purpose, highly-parallel applications using hundreds of available on-chip cores. Meanwhile, as technology is scaled down below 65nm, each core’s maximum frequency varies significantly due to increasing within-die variations. This, in turn, diminishes the throughput improvement of GPGPUs through technology scaling because […]

high performance computing on graphics processing units: hgpu.org

Posts

Secret Key Cryptography Using Graphics Cards

Realistic rendering of surface appearance using GPU

Scalable packet classification via GPU metaprogramming

GPU accelerated fuzzy connected image segmentation by using CUDA

Improving the Speed of Virtual Rear Projection: A GPU-Centric Architecture

Improving performance for emergent environments parameter tuning and simulation in games using GPU

A GPU-based closed frequent itemsets mining algorithm over stream

Optimized GPU Framework for Ultrasound Color Flow Imaging

A trigger system based on Graphics Processing Unit (GPU)

An architecture design of GPU-accelerated VoD streaming servers with network coding

A comprehensive analysis and parallelization of an image retrieval algorithm

Analyzing throughput of GPGPUs exploiting within-die core-to-core frequency variation

Recent source codes

Agentic Code Optimization via Compiler-LLM Cooperation

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

True 4-Bit Quantized CNN Training on CPU

cuFuzz: A GPU-oriented coverage-guided fuzzer for userland CUDA application

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

Most viewed papers (last 30 days)