high performance computing on graphics processing units: hgpu.org

Posts

Jan, 16

CURFIL: Random Forests for Image Labeling on GPU

Random forests are popular classifiers for computer vision tasks such as image labeling or object detection. Learning random forests on large datasets, however, is computationally demanding. Slow learning impedes model selection and scientific research on image features. We present an open-source implementation that significantly accelerates both random forest learning and prediction for image labeling of […]

CUDA

Jan, 16

SW#db: GPU-accelerated exact sequence similarity database search

The deluge of next-generation sequencing (NGS) data and expanding database poses higher requirements for protein similarity search. State-of-the-art tools such as BLAST are not fast enough to cope with these requirements. Because of that it is necessary to create new algorithms that will be faster while keeping similar sensitivity levels. The majority of protein similarity […]

CUDA

Jan, 16

Parallel Algorithms for Counting Problems on Graphs Using Graphics Processing Units

The availability of Graphics Processing Units (GPUs) with multicore architecture have enabled parallel computations using extensive multi-threading. Recent advancements in computer hardware have led to the usage of graphics processors for solving general purpose problems. Using GPUs for computation is a highly efficient and low-cost alternative as compared to currently available multicore Central Processing Units […]

CUDA

Jan, 16

GPU Processing for UAS-Based LFM-CW Stripmap SAR

Unmanned air systems (UAS) provide an excellent platform for synthetic aperture radar (SAR), enabling surveillance and research over areas too difficult, dangerous, or costly to reach using manned aircraft. However, the nimble nature of the small UAS makes them more susceptible to external forces, thus requiring significant motion compensation in order for SAR images to […]

CUDA

Jan, 16

Parallel Implementation of the Finite Element Method on Graphics Processors for the Solution of Incompressible Flows

In recent years clock speeds and memory bandwidths of Graphics Processing Units (GPUs) increased dramatically compared to CPUs. Also GPU vendors developed and freely released new programming tools to make scientific computing on GPUs easier. With these recent developments the use of GPUs for general purpose computing becomes a popular research field. Researchers previously demonstrated […]

CUDA

Jan, 15

A Novel Computational Model for GPUs with Applications to Efficient Algorithms

We propose a novel computational model for GPUs. Known parallel computational models such as the PRAM model are not appropriate for evaluating GPU-based algorithms. Our model, called AGPU, abstracts the essence of current GPU architectures such as global and shared memory, memory coalescing and bank conflicts. Using our model, we can evaluate asymptotic behavior of […]

CUDA

Jan, 15

Identification and Elimination of Platform-Specific Code Smells in High Performance Computing Applications

A code smell is a code pattern that might indicate a code or design problem, which makes the application code hard to evolve and maintain. Automatic detection of code smells has been studied to help users find which parts of their application codes should be refactored. However, code smells have not been defined in a […]

CUDA

•

OpenCL

Jan, 15

Reducing overheads of dynamic scheduling on heterogeneous chips

In recent processor development, we have witnessed the integration of GPU and CPUs into a single chip. The result of this integration is a reduction of the data communication overheads. This enables an efficient collaboration of both devices in the execution of parallel workloads. In this work, we focus on the problem of efficiently scheduling […]

OpenCL

Jan, 15

Batched Matrix Computations on Hardware Accelerators Based on GPUs

Scientific applications require solvers that work on many small size problems that are independent from each other. At the same time, the high-end hardware evolves rapidly and becomes ever more throughput-oriented and thus there is an increasing need for effective approach to develop energy efficient, high-performance codes for these small matrix problems that we call […]

CUDA

Jan, 15

High Performance GPU-based Fourier Volume Rendering

FVR (Fourier volume rendering) is a significant visualization technique that has been used widely in digital radiography. As a results of its O(N^2logN) time complexity, it provides a faster alternative to spatial domain volume rendering algorithms that are O(N^3) computationally complex. Relying on the Fourier projection-slice theorem, this technique operates on the spectral representation of […]

CUDA

•

OpenGL

Jan, 15

International Conference on Signal Processing, ICOSP 2015

Topics： Adaptive Filtering & Signal Processing Ad-Hoc and Sensor Networks Analog and Mixed Signal Processing Biometrics & Authentification Biosignal Processing & Understanding Communication and Broadband Networks Communication Signal processing Computer Vision & Virtual Reality Cryptography and Network Security Design and Implementation of Signal Processing Systems Image and Multidimensional Signal Processing Image Processing & Understanding Machine […]

Jan, 13

Linear Performance-Breakdown Model: A Framework for GPU kernel programs performance analysis

In this paper we describe our performance-breakdown model for GPU programs. GPUs are a popular choice as accelerator hardware due to their high performance, high availability and relatively low price. However, writing programs that are highly efficient represents a difficult and time consuming task for programmers because of the complexities of GPU architecture and the […]

OpenCL

high performance computing on graphics processing units: hgpu.org

Posts

CURFIL: Random Forests for Image Labeling on GPU

SW#db: GPU-accelerated exact sequence similarity database search

Parallel Algorithms for Counting Problems on Graphs Using Graphics Processing Units

GPU Processing for UAS-Based LFM-CW Stripmap SAR

Parallel Implementation of the Finite Element Method on Graphics Processors for the Solution of Incompressible Flows

A Novel Computational Model for GPUs with Applications to Efficient Algorithms

Identification and Elimination of Platform-Specific Code Smells in High Performance Computing Applications

Reducing overheads of dynamic scheduling on heterogeneous chips

Batched Matrix Computations on Hardware Accelerators Based on GPUs

High Performance GPU-based Fourier Volume Rendering

International Conference on Signal Processing, ICOSP 2015

Linear Performance-Breakdown Model: A Framework for GPU kernel programs performance analysis

Recent source codes

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

Agentic Code Optimization via Compiler-LLM Cooperation

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Most viewed papers (last 30 days)