high performance computing on graphics processing units: hgpu.org

Posts

Jan, 20

Simultaneous and fast 3D tracking of multiple faces in video by GPU-based stream processing

In this work, we implement a real-time visual tracker that targets the position and 3D pose of objects in video sequences, specifically faces. Using stream processors for performing the computations as well as efficient sparse-template-based particle filtering allows us to achieve real-time processing even when tracking multiple objects simultaneously in high- resolution video frames. Stream […]

CUDA

Jan, 20

Contouring for Power Systems Using Graphical Processing Units

To improve situational awareness in power systems, one useful tool used in control centers is bus (or substation) data contouring. Traditionally, the methods developed have used CPU processing, leading to long contour rendering times that reduce interactivity with the visualization. To improve interactivity and increase the data rate which can be handled, contouring methods utilizing […]

OpenGL

Jan, 20

SPRAT: Runtime processor selection for energy-aware computing

A commodity personal computer (PC) can be seen as a hybrid computing system equipped with two different kinds of processors, i.e. CPU and a graphics processing unit (GPU). Since the superiorities of GPUs in the performance and the power efficiency strongly depend on the system configuration and the data size determined at the runtime, a […]

CUDA

Jan, 20

Evolution of image filters on graphics processor units using Cartesian Genetic Programming

Graphics processor units are fast, inexpensive parallel computing devices. Recently there has been great interest in harnessing this power for various types of scientific computation, including genetic programming. In previous work, we have shown that using the graphics processor provides dramatic speed improvements over a standard CPU in the context of fitness evaluation. In this […]

Jan, 20

Real-Time GPU-Based Voxel Carving with Systematic Occlusion Handling

We present an approach to compute the visual hulls of multiple people in real-time in the presence of occlusions. We prove that the resulting visual hulls are correct and minimal under occlusions. Our proposed algorithm runs completely on the GPU with framerates up to 50fps for multiple people using only one computer equipped with off-the-shelf […]

CUDA

Jan, 20

Fast development of dense linear algebra codes on graphics processors

We present an application programming interface (API) for the C programming language that facilitates the development of dense linear algebra algorithms on graphics processors applying the FLAME methodology. The interface, built on top of the NVIDIA CUBLAS library, implements all the computational functionality of the FLAME/C interface. In addition, the API includes data transference routines […]

CUDA

Jan, 20

Direct N-body Kernels for Multicore Platforms

We present an inter-architectural comparison of single-and double-precision direct n-body implementations on modern multicore platforms, including those based on the Intel Nehalem and AMD Barcelona systems, the Sony-Toshiba-IBM PowerXCell/8i processor, and NVIDA Tesla C870 and C1060 GPU systems. We compare our implementations across platforms on a variety of proxy measures, including performance, coding complexity, and […]

CUDA

Jan, 20

Motion Estimation with Non-Local Total Variation Regularization

State-of-the-art motion estimation algorithms suffer from three major problems: Poorly textured regions, occlusions and small scale image structures. Based on the Gestalt principles of grouping we propose to incorporate a low level image segmentation process in order to tackle these problems. Our new motion estimation algorithm is based on non-local total variation regularization which allows […]

CUDA

Jan, 20

Graph Analysis with High-Performance Computing

Large, complex graphs arise in many settings including the Internet, social networks, and communication networks. To study such data sets, the authors explored the use of high-performance computing (HPC) for graph algorithms. They found that the challenges in these applications are quite different from those arising in traditional HPC applications and that massively multithreaded machines […]

Jan, 20

TEDI: efficient shortest path query answering on graphs

Efficient shortest path query answering in large graphs is enjoying a growing number of applications, such as ranked keyword search in databases, social networks, ontology reasoning and bioinformatics. A shortest path query on a graph finds the shortest path for the given source and target vertices in the graph. Current techniques for efficient evaluation of […]

Jan, 20

Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering

Twenty-five years ago, Crow published the shadow volume approach for determining shadowed regions in a scene. A decade ago, Heidmann described a hardware-accelerated stencil buffer-based shadow volume algorithm. Unfortunately hardware-accelerated stenciled shadow volume techniques have not been widely adopted by 3D games and applications due in large part to the lack of robustness of described […]

OpenGL

Jan, 19

Accelerating Quadrature Methods for Option Valuation

This paper presents an architecture for FPGA acceleration of quadrature methods used for pricing complex options, such as discrete barrier, Bermudan, and American options. The architecture can be optimized for speed and power consumption by exploiting pipelining and parallelism to produce efficient implementations in reconfigurable logic. An optimised implementation using Graphics Processing Units (GPUs) is […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Simultaneous and fast 3D tracking of multiple faces in video by GPU-based stream processing

Contouring for Power Systems Using Graphical Processing Units

SPRAT: Runtime processor selection for energy-aware computing

Evolution of image filters on graphics processor units using Cartesian Genetic Programming

Real-Time GPU-Based Voxel Carving with Systematic Occlusion Handling

Fast development of dense linear algebra codes on graphics processors

Direct N-body Kernels for Multicore Platforms

Motion Estimation with Non-Local Total Variation Regularization

Graph Analysis with High-Performance Computing

TEDI: efficient shortest path query answering on graphs

Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering

Accelerating Quadrature Methods for Option Valuation

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)