high performance computing on graphics processing units: hgpu.org

Posts

Aug, 17

High Throughput Variable Size Non-square Gabor Engine with Feature Pooling Based on GPU

Increasing application of Gabor feature space in various computer vision tasks and its high computational demand, encourages using parallel computing technologies. In this work we have designed a high throughput GPU based Gabor kernel that mimics the function of initial biological visual cortex layers namely ‘Simple’ and ‘Complex’ cells. The kernel is basically a Gabor […]

CUDA

•

OpenCL

Aug, 17

Robotic approach to multi-beam optical tweezers with Computer Generated Hologram

Multi-beam optical tweezers is important technique to manipulate multiple small objects. Computer Generated Hologram (CGH) is one of the techniques and it can trap more than 200 objects in three dimension. For dexterous micromanipulation, it is useful to apply robotics into optical tweezers. In this research, we designed the optical system and control system of […]

Aug, 17

Regular Expression Matching and Operational Semantics

Many programming languages and tools, ranging from grep to the Java String library, contain regular expression matchers. Rather than first translating a regular expression into a deterministic finite automaton, such implementations typically match the regular expression on the fly. Thus they can be seen as virtual machines interpreting the regular expression much as if it […]

CUDA

Aug, 16

Training Logistic Regression and SVM on 200GB Data Using b-Bit Minwise Hashing and Comparisons with Vowpal Wabbit (VW)

We generated a dataset of 200 GB with 10^9 features, to test our recent b-bit minwise hashing algorithms for training very large-scale logistic regression and SVM. The results confirm our prior work that, compared with the VW hashing algorithm (which has the same variance as random projections), b-bit minwise hashing is substantially more accurate at […]

Aug, 16

Topical perspective on massive threading and parallelism

Unquestionably computer architectures have undergone a recent and noteworthy paradigm shift that now delivers multi- and many-core systems with tens to many thousands of concurrent hardware processing elements per workstation or supercomputer node. GPGPU (General Purpose Graphics Processor Unit) technology in particular has attracted significant attention as new software development capabilities, namely CUDA (Compute Unified […]

CUDA

•

OpenCL

Aug, 16

Tileable BTF

This paper presents a modular framework to efficiently apply the bidirectional texture functions (BTF) onto object surfaces. The basic building blocks are the BTF tiles. By constructing one set of BTF tiles, a wide variety of objects can be textured seamlessly without resynthesizing the BTF. The proposed framework nicely decouples the surface appearance from the […]

Aug, 16

Browsing Large Image Datasets through Voronoi Diagrams

Conventional browsing of image collections use mechanisms such as thumbnails arranged on a regular grid or on a line, often mounted over a scrollable panel. However, this approach does not scale well with the size of the datasets (number of images). In this paper, we propose a new thumbnail-based interface to browse large collections of […]

Aug, 16

An algorithm-architecture co-design framework for gridding reconstruction using FPGAs

Gridding is a method of interpolating irregularly sampled data on to a uniform grid and is a critical image reconstruction step in several applications which operate on non-Cartesian sampled data. In this paper, we present an algorithm-architecture co-design framework for accelerating gridding using FPGAs. We present a parameterized hardware library for accelerating gridding to support […]

Aug, 16

Accelerating the Nonuniform Fast Fourier Transform Using FPGAs

We present an FPGA accelerator for the Non-uniform Fast Fourier Transform, which is a technique to reconstruct images from arbitrarily sampled data. We accelerate the compute-intensive interpolation step of the NuFFT Gridding algorithm by implementing it on an FPGA. In order to ensure efficient memory performance, we present a novel FPGA implementation for Geometric Tiling […]

Aug, 16

SHARC: A streaming model for FPGA accelerators and its application to Saliency

Reconfigurable hardware such as FPGAs are being increasingly employed for accelerating compute-intensive applications. While recent advances in technology have increased the capacity of FPGAs, lack of standard models for developing custom accelerators creates issues with scalability and compatibility. We present SHARC – Streaming Hardware Accelerator with Run-time Configurability, for an FPGA-based accelerator. This model is […]

Aug, 16

Automatic Point Target Detection for Interactive Visual Analysis of SAR Images

Point target analysis is an important tool to analyze the quality of SAR images. To permit interactive visual analysis, visualization applications need to automatically detect point targets in a SAR image and estimate associated quality measurements such as the peak sidelobe ratio (PSLR). This task is computationally expensive. In this paper, we propose methods for […]

Aug, 16

Automatic Multi-Camera Setup Optimization for Optical Tracking

We propose a method to determine the optimal camera alignment for a tracking system with multiple cameras by specifying the volume to be tracked and an initial camera setup. We use optimization strategies based on methods usually employed for solving nonlinear systems of equations. All approaches are fully automatic and take advantage of modern graphics […]

* * *

high performance computing on graphics processing units: hgpu.org

Posts

High Throughput Variable Size Non-square Gabor Engine with Feature Pooling Based on GPU

Robotic approach to multi-beam optical tweezers with Computer Generated Hologram

Regular Expression Matching and Operational Semantics

Training Logistic Regression and SVM on 200GB Data Using b-Bit Minwise Hashing and Comparisons with Vowpal Wabbit (VW)

Topical perspective on massive threading and parallelism

Tileable BTF

Browsing Large Image Datasets through Voronoi Diagrams

An algorithm-architecture co-design framework for gridding reconstruction using FPGAs

Accelerating the Nonuniform Fast Fourier Transform Using FPGAs

SHARC: A streaming model for FPGA accelerators and its application to Saliency

Automatic Point Target Detection for Interactive Visual Analysis of SAR Images

Automatic Multi-Camera Setup Optimization for Optical Tracking

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)