high performance computing on graphics processing units: hgpu.org

Posts

Dec, 26

Sparse matrix-vector multiplication on GPGPU clusters: A new storage format and a scalable implementation

Sparse matrix-vector multiplication (spMVM) is the dominant operation in many sparse solvers. We investigate performance properties of spMVM with matrices of various sparsity patterns on the nVidia "Fermi" class of GPGPUs. A new "padded jagged diagonals storage" (pJDS) format is proposed which may substantially reduce the memory overhead intrinsic to the widespread ELLPACK-R scheme. In […]

Dec, 25

Designing Fast Architecture Sensitive Tree Search on Modern Multi-Core/Many-Core Processors

In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous computing power by integrating multiple cores, each with wide vector units. There has been much work to exploit modern processor architectures for database primitives like scan, sort, join and aggregation. However, unlike other primitives, tree search presents significant challenges due to […]

CUDA

Dec, 25

Turbo Bayesian Compressed Sensing

Compressed sensing (CS) theory specifies a new signal acquisition approach, potentially allowing the acquisition of signals at a much lower data rate than the Nyquist sampling rate. In CS, the signal is not directly acquired but reconstructed from a few measurements. One of the key problems in CS is how to recover the original signal […]

CUDA

Dec, 25

Modeling Parallel Programs for Heterogeneous Computing

With the growing interest in multicore processors and Graphics Processing Units (GPUs), heterogeneous computing is an emerging necessity to fully utilize computing resources. In the traditional approach to parallel programming, programmers are required to manage the sequential part of a program while rewriting the parallel part to a more efficient parallel programming model. Our evaluation […]

CUDA

Dec, 25

Methodology of control and supervision of web connected mobile robots with CUDA technology application

The main problem of the following paper is control and supervision of web connected mobile robots. Taking up this subject is justified by the need of developing new methods for control, supervision and integration of exis-ting modules (inspection robots, autonomous robots, mo-bile base station). The methodology consists of: multi ro-botic system structure, cognitive model of […]

CUDA

Dec, 25

Accelerating linear system solutions using randomization techniques

We illustrate how linear algebra calculations can be enhanced by statistical techniques in the case of a square linear system Ax = b. We study a random transformation of A that enables us to avoid pivoting and then to reduce the amount of communication. Numerical experiments show that this randomization can be performed at a […]

CUDA

Dec, 25

G-CP: Providing Fault Tolerance on the GPU through Software Checkpointing

GPUs have become increasingly popular in recent years, in large part due to their potential to offer a large amount of computational power at low prices. GPU designers have also made GPU pipelines more general purpose and more programmable, which has made GPUs more attractive to a wider audience. Thus, it is increasingly important to […]

CUDA

Dec, 25

Design and Development of Optical Flow Based Obstacle Avoidance Using CUDA

Autonomous vehicles and robots that navigate based on camera input need techniques for obstacle avoidance for successful navigation. Optical Flow based obstacle avoidance is one of the most popular techniques. Real-time motion estimation remains a challenge because of its high computational expense. The traditional CPU-based schemes satisfy the power, size and computation requirements. Graphical Processing […]

CUDA

Dec, 25

GPU-Parallel Implementation of Color based Medical Image Retrieval in Compressed Domain

In huge databases Image processing takes more time for execution on a single core processor because of slow single thread algorithms. Graphics Processing Unit (GPU) is more popular now-a-days due to their speed, programmability, low cost and more inbuilt execution cores in it. Most of the researchers started work to use GPUs as a processing […]

CUDA

Dec, 25

Parallel Implementation of Shape based Image Retrieval Approach on CUDA in Compressed Domain

Fast and accurate algorithms are necessary for Content based image retrieval (CBIR) systems to perform operations on compressed images databases such as jpeg or through compressive sensing. Feature extraction and feature matching are two important steps in any CBIR system. Wrong matching may affect the accuracy rate of CBIR systems. The matching of query image […]

CUDA

Dec, 25

A GPU-supported High-Level Programming Language for Image Processing

Real-time image/video processing applications are now in demand with the advance of general purpose computers and mobile devices. However, programmers have to handle the digital images, and be aware of the resolutions and pixels. This makes image processing programming unintuitive. On the other hand, image/video processing typically has data parallelisms, and the performance gains are […]

CUDA

Dec, 24

Toolchain for programming, simulating and studying the XMT many-core architecture

The Explicit Multi-Threading (XMT) is a general-purpose many-core computing platform, with the vision of a 1000-core chip that is easy to program but does not compromise on performance. This paper presents a publicly available tool chain for XMT, complete with a highly configurable cycle-accurate simulator and an optimizing compiler. The XMT tool chain has matured […]

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Sparse matrix-vector multiplication on GPGPU clusters: A new storage format and a scalable implementation

Designing Fast Architecture Sensitive Tree Search on Modern Multi-Core/Many-Core Processors

Turbo Bayesian Compressed Sensing

Modeling Parallel Programs for Heterogeneous Computing

Methodology of control and supervision of web connected mobile robots with CUDA technology application

Accelerating linear system solutions using randomization techniques

G-CP: Providing Fault Tolerance on the GPU through Software Checkpointing

Design and Development of Optical Flow Based Obstacle Avoidance Using CUDA

GPU-Parallel Implementation of Color based Medical Image Retrieval in Compressed Domain

Parallel Implementation of Shape based Image Retrieval Approach on CUDA in Compressed Domain

A GPU-supported High-Level Programming Language for Image Processing

Toolchain for programming, simulating and studying the XMT many-core architecture

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)