high performance computing on graphics processing units: hgpu.org

Posts

Jun, 15

A low-power integrated x86-64 and graphics processor for mobile computing devices

AMD’s first Fusion Accelerated Processor Unit (APU) codenamed "Zacate" combines a pair of x86 CPUs cores codenamed "Bobcat", 1MB L2 Cache, Client Northbridge (CNB), with a DirectX" 11 Radeon HD5000 graphics/multimedia controller on a single die. The CNB provides an interface to a sin gle 64b DDR3 memory channel, which can operate at up to […]

Jun, 15

Accelerating Genome-Wide Association Studies Using CUDA Compatible Graphics Processing Units

Recent advances in highly parallel, multithreaded, manycore Graphics Processing Units (GPUs) have been enabling massive parallel implementations of many applications in bioinformatics. In this paper, we describe a parallel implementation of genome-wide association studies (GWAS) using Compute Unified Device Architecture (CUDA). Using a single NVIDIA GTX 280 graphics card, we achieve speedups of about 15 […]

CUDA

Jun, 15

A Reconfigurable Processor for Phylogenetic Inference

A reconfigurable processor tailored for accelerating Phylogenetic Inference is proposed. In this paper, a programmable and scalable architectural platform instantiates an array of coarse grained light weight processing elements and allows arbitrary partitioning and scheduling schemes and capable of solving complete Maximum Likelihood algorithm and deal with arbitrarily large sequences. The key difference of the […]

Jun, 15

Full-resolution interactive CPU volume rendering with coherent BVH traversal

We present an efficient method for volume rendering by ray casting on the CPU. We employ coherent packet traversal of an implicit bounding volume hierarchy, heuristically pruned using preintegrated transfer functions, to exploit empty or homogeneous space. We also detail SIMD optimizations for volumetric integration, trilinear interpolation, and gradient lighting. The resulting system performs well […]

Jun, 15

Accelerating Viola-Jones Face Detection to FPGA-Level Using GPUs

Face detection is an important aspect for biometrics, video surveillance and human computer interaction. We present a multi-GPU implementation of the Viola-Jones face detection algorithm that meets the performance of the fastest known FPGA implementation. The GPU design offers far lower development costs, but the FPGA implementation consumes less power. We discuss the performance programming […]

CUDA

Jun, 15

Pedestrian detection system based on stereo vision for mobile robot

This paper presents a novel Graphics Processing Unit (GPU)-based system for pedestrian detection with stereo vision in real images on mobile robot. The process of obtaining a dense disparity map on a GPU for real-time applications and the edge property of the scene to extract a region of interest (ROI) is designed. After extracting the […]

Jun, 15

The accelerating implementation of BLAST with stream processor

Sequence alignment is one of the most fundamental and important operation in bioinformatics. Through sequence alignment, we can find the sequence’s information of function, structure and evolution. BLAST is one of the most popular algorithms in the field of sequence alignment. In this paper, we have designed a GPU-based parallel BLAST algorithm and implemented it […]

Jun, 15

A compiler for high performance computing with many-core accelerators

We introduce a newly developed compiler for high performance computing using many-core accelerators. A high peak performance of such accelerators attracts researchers who are always demanding faster computers. However, it is difficult to create an efficient implementation of an existing serial program for such accelerators even in the case of massively parallel problems. While existing […]

Jun, 15

Robust modified L2 local optical flow estimation and feature tracking

This paper describes a robust method for the local optical flow estimation and the KLT feature tracking performed on the GPU. Therefore we present an estimator based on the L^2 norm with robust characteristics. In order to increase the robustness at discontinuities we propose a strategy to adapt the used region size. The GPU implementation […]

Jun, 15

Xbox 360 System Architecture

This article covers the Xbox 360’s high-level technical requirements, a short system overview, and details of the CPU and the GPU. The Xbox 360 contains an aggressive hardware architecture and implementation targeted at game console workloads. The core silicon implements the product designers’ goal of providing game developers a hardware platform to implement their next-generation […]

Jun, 15

Keynote address: Immersive exploration of large datasets

Scientists, engineers and physicians are now confronted with a fire hose of data. Immersive visualization environments provide these users with a novel way of interacting and reasoning with large datasets. They allow them to utilize the entirety of their visual bandwidth, effectively engulfing the user in the data and enabling collaborative interaction. We present a […]

Jun, 14

Real-time numerical dispersion compensation using graphics processing unit for Fourier-domain optical coherence tomography

Numerical dispersion compensation for both standard and full-range Fourier-domain optical coherence tomography (FD-OCT) on the graphics processing unit (GPU) architecture has been implemented. The data acquisition, processing and image display were performed on a multi-thread, CPU-GPU heterogeneous computing system. The real-time ultra-high-resolution full-range complex-conjugate-free FD-OCT imaging was demonstrated at 68.4 frame/s with a frame size […]

* * *

high performance computing on graphics processing units: hgpu.org

Posts

A low-power integrated x86-64 and graphics processor for mobile computing devices

Accelerating Genome-Wide Association Studies Using CUDA Compatible Graphics Processing Units

A Reconfigurable Processor for Phylogenetic Inference

Full-resolution interactive CPU volume rendering with coherent BVH traversal

Accelerating Viola-Jones Face Detection to FPGA-Level Using GPUs

Pedestrian detection system based on stereo vision for mobile robot

The accelerating implementation of BLAST with stream processor

A compiler for high performance computing with many-core accelerators

Robust modified L2 local optical flow estimation and feature tracking

Xbox 360 System Architecture

Keynote address: Immersive exploration of large datasets

Real-time numerical dispersion compensation using graphics processing unit for Fourier-domain optical coherence tomography

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)