high performance computing on graphics processing units: hgpu.org

Posts

Mar, 14

GPU Accelerated Face Detection

Recently many-core graphic processor units (GPUs) are delivering impressive power for general purpose computing applications. Thanks to their high memory bandwidth and computing throughput, GPUs could often significantly accelerate many applications. In this paper, we present a CPU-GPU cooperative implementation for a Viola-Jones based face detection system. The experiment results show that our face detector […]

Mar, 14

Option Pricing on the GPU

In recent years, Graphics Processing Units (GPUs) have been opened to general purpose programming. As a result, researchers and developers have access to the massively parallel GPU architecture for applications beyond that of graphics rendering and gaming. We first investigate a design and implementation of the trinomial lattice strategy for the pricing of simple European […]

Mar, 14

Incoherent Ray tracing on GPU

Tracing secondary rays, such as reflection, refraction and shadow rays, can often be the most costly step in a modern real-time ray tracer. In this paper, we propose a new approach to ray tracing on GPU. Our approach is especially efficient for incoherent rays. Combined with the common packets ray tracing, we propose a different […]

CUDA

Mar, 14

Hardware Acceleration of EDA Algorithms: Custom ICs, FPGAs and GPUs

This book deals with the acceleration of EDA algorithms using hardware platforms such as FPGAs and GPUs. Widely applied CAD algorithms are evaluated and compared for potential acceleration on FPGAs and GPUs. Coverage includes discussion of conditions under which it is preferable to use one platform over another, e.g., when an EDA problem has a […]

CUDA

Mar, 14

Expanding the boundaries of GPU computing

Supporting up to 16 PCI Express devices in a flexible, highly efficient design, the Dell PowerEdge C410x expansion chassis helps organizations take advantage of the next step in high-performance computing architectures: GPU computing.

CUDA

Mar, 14

GPU Accelerated Cardiac Electrophysiology

Numerical simulations of cellular membranes are useful for both basic science and increasingly for clinical diagnostic and therapeutic applications. A common bottleneck in such simulations arises from solving large highly complex stiff systems of ordinary differential equations (ODEs) thousands of times for numerous collocation points (representing cells) throughout a three-dimensional volume. For some electrophysiology simulations, […]

CUDA

Mar, 13

Comparing GPU and CPU in OLAP Cubes Creation

GPGPU (General Purpose Graphical Processing Unit) programming is receiving more attention recently because of enormous computations speed up offered by this technology. GPGPU is applied in many branches of science and industry not excluding databases, even if this is not the primary field of expected benefits. In this paper a typical time consuming database algorithm, […]

CUDA

Mar, 13

Obsidian: GPU Programming in Haskell

Obsidian is a language for data-parallel programming embedded in Haskell. As the Obsidian programs are run, C code is generated. This C code can be compiled for an NVIDIA 8800 series GPU (Graphics Processing Unit), or for other high-end NVIDIA GPUs. The idea is that the style of programming used in Lava for structural hardware […]

CUDA

Mar, 13

Obsidian: GPU Kernel Programming in Haskell (thesis)

Graphics Processing Units (GPUs) are evolving into powerful general purpose computing platforms. At first, GPU performance was driven by the requirements of 3D graphics computer games. To fit this workload, a GPU is a many-core processor suitable for the data-parallel programming paradigm. Today, GPUs come with hundreds of processing elements and a theoretical single precision […]

CUDA

Mar, 13

High Quality Elliptical Texture Filtering on GPU

The quality of the available hardware texture filtering, even on state of the art graphics hardware, suffers from several aliasing artifacts, in both spatial and temporal domain. Those artifacts are mostly evident in extreme conditions, such as grazing viewing angles, highly warped texture coordinates, or extreme perspective and become especially annoying when animation is involved. […]

OpenGL

Mar, 13

GPU-based Multilevel Clustering

The processing power of parallel co-processors like the Graphics Processing Unit (GPU) are dramatically increasing. However, up until now only a few approaches have been presented to utilize this kind of hardware for mesh clustering purposes. In this paper we introduce a Multilevel clustering technique designed as a parallel algorithm and solely implemented on the […]

Mar, 13

Real-Time Image Segmentation on a GPU

Efficient segmentation of color images is important for many applications in computer vision. Non-parametric solutions are required in situations where little or no prior knowledge about the data is available. In this paper, we present a novel parallel image segmentation algorithm which segments images in real-time in a non-parametric way. The algorithm finds the equilibrium […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

GPU Accelerated Face Detection

Option Pricing on the GPU

Incoherent Ray tracing on GPU

Hardware Acceleration of EDA Algorithms: Custom ICs, FPGAs and GPUs

Expanding the boundaries of GPU computing

GPU Accelerated Cardiac Electrophysiology

Comparing GPU and CPU in OLAP Cubes Creation

Obsidian: GPU Programming in Haskell

Obsidian: GPU Kernel Programming in Haskell (thesis)

High Quality Elliptical Texture Filtering on GPU

GPU-based Multilevel Clustering

Real-Time Image Segmentation on a GPU

Recent source codes

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

LC Framework

pplx-garden: Perplexity open source garden for inference technology

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

OpScanner

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Most viewed papers (last 30 days)