high performance computing on graphics processing units: hgpu.org

Posts

Mar, 15

Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining

Scaling up the sparse matrix-vector multiplication kernel on modern Graphics Processing Units (GPU) has been at the heart of numerous studies in both academia and industry. In this article we present a novel non-parametric, self-tunable, approach to data representation for computing this kernel, particularly targeting sparse matrices representing power-law graphs. Using real data, we show […]

CUDA

Mar, 14

Fast Human Detection with Cascaded Ensembles

Detecting people in images is a challenging task because of the variability in clothing and illumination conditions, and the wide range of poses that people can adopt. To discriminate the human shape clearly, Dalal and Triggs [1] proposed a gradient based, robust feature set that yielded excellent detection results. This method computes locally normalized gradient […]

CUDA

Mar, 14

Fast Human Detection with Cascaded Ensembles on the GPU

We investigate a fast pedestrian localization framework that integrates the cascade-of-rejectors approach with the Histograms of Oriented Gradients (HoG) features on a data parallel architecture. The salient features of humans are captured by HoG blocks of variable sizes and locations which are chosen by the AdaBoost algorithm from a large set of possible blocks. We […]

CUDA

Mar, 14

Efficient Integral Image Computation on the GPU

We present an integral image algorithm that can run in real-time on a Graphics Processing Unit (GPU). Our system exploits the parallelisms in computation via the NIVIDA CUDA programming model, which is a software platform for solving non-graphics problems in a massively parallel high-performance fashion. This implementation makes use of the work-efficient scan algorithm that […]

CUDA

Mar, 14

High-dimensional Planning on the GPU

Optimal heuristic searches such as A* search are commonly used for low-dimensional planning such as 2D path finding. These algorithms however, typically do not scale well to high-dimensional planning problems such as motion planning for robotic arms, computing motion trajectories for non-holonomic robotic vehicles and motion synthesis for humanoid characters. A recently developed randomized version […]

CUDA

Mar, 14

A fast GPU algorithm for graph connectivity

Graphics processing units provide a large computational power at a very low price which position them as an ubiquitous accelerator. General purpose programming on the graphics processing units (GPGPU) is best suited for regular data parallel algorithms. They are not directly amenable for algorithms which have irregular data access patterns such as list ranking, and […]

CUDA

Mar, 14

Option Pricing on the GPU

In recent years, Graphics Processing Units (GPUs) have been opened to general purpose programming. As a result, researchers and developers have access to the massively parallel GPU architecture for applications beyond that of graphics rendering and gaming. We first investigate a design and implementation of the trinomial lattice strategy for the pricing of simple European […]

Mar, 14

GPU Accelerated Face Detection

Recently many-core graphic processor units (GPUs) are delivering impressive power for general purpose computing applications. Thanks to their high memory bandwidth and computing throughput, GPUs could often significantly accelerate many applications. In this paper, we present a CPU-GPU cooperative implementation for a Viola-Jones based face detection system. The experiment results show that our face detector […]

Mar, 14

Incoherent Ray tracing on GPU

Tracing secondary rays, such as reflection, refraction and shadow rays, can often be the most costly step in a modern real-time ray tracer. In this paper, we propose a new approach to ray tracing on GPU. Our approach is especially efficient for incoherent rays. Combined with the common packets ray tracing, we propose a different […]

CUDA

Mar, 14

Hardware Acceleration of EDA Algorithms: Custom ICs, FPGAs and GPUs

This book deals with the acceleration of EDA algorithms using hardware platforms such as FPGAs and GPUs. Widely applied CAD algorithms are evaluated and compared for potential acceleration on FPGAs and GPUs. Coverage includes discussion of conditions under which it is preferable to use one platform over another, e.g., when an EDA problem has a […]

CUDA

Mar, 14

Expanding the boundaries of GPU computing

Supporting up to 16 PCI Express devices in a flexible, highly efficient design, the Dell PowerEdge C410x expansion chassis helps organizations take advantage of the next step in high-performance computing architectures: GPU computing.

CUDA

Mar, 14

GPU Accelerated Cardiac Electrophysiology

Numerical simulations of cellular membranes are useful for both basic science and increasingly for clinical diagnostic and therapeutic applications. A common bottleneck in such simulations arises from solving large highly complex stiff systems of ordinary differential equations (ODEs) thousands of times for numerous collocation points (representing cells) throughout a three-dimensional volume. For some electrophysiology simulations, […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining

Fast Human Detection with Cascaded Ensembles

Fast Human Detection with Cascaded Ensembles on the GPU

Efficient Integral Image Computation on the GPU

High-dimensional Planning on the GPU

A fast GPU algorithm for graph connectivity

Option Pricing on the GPU

GPU Accelerated Face Detection

Incoherent Ray tracing on GPU

Hardware Acceleration of EDA Algorithms: Custom ICs, FPGAs and GPUs

Expanding the boundaries of GPU computing

GPU Accelerated Cardiac Electrophysiology

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)