high performance computing on graphics processing units: hgpu.org

Posts

Feb, 23

Adaptive Hardware-accelerated Terrain Tessellation

In this master thesis report, a scheme for adaptive hardware tessellation is presented. The scheme uses an offline processing approach where a height map is analyzed in terms of curvature and the result is stored in a resource called density map. This density map is then bound as a resource to the hardware tessellation stage […]

Feb, 23

Parallel Computer Vision: Person Data Extraction

Face recognition has been established in many environments these days. It is used in security systems, social media platforms or in digital cameras to support the user. In addition, the rapidly rising number of CPU cores in modern PCs or handhelds let us do more complex work on a single machine. The central question of […]

CUDA

Feb, 22

Performance Upper Bound Analysis and Optimization of SGEMM on Fermi and Kepler GPUs

In this paper, we present an approach to estimate GPU applications’ performance upper bound based on algorithm analysis and assembly code level benchmarking. As an example, we analyze the potential peak performance of SGEMM (Single-precision General Matrix Multiply) on Fermi (GF110) and Kepler (GK104) GPUs. We try to answer the question of how much optimization […]

CUDA

Feb, 22

An efficient scheduling scheme using estimated execution time for heterogeneous computing systems

Computing systems should be designed to exploit parallelism in order to improve performance. In general, a GPU (Graphics Processing Unit) can provide more parallelism than a CPU (Central Processing Unit), resulting in the wide usage of heterogeneous computing systems that utilize both the CPU and the GPU together. In the heterogeneous computing systems, the efficiency […]

CUDA

Feb, 22

A Parallel Active-Set Method for Solving Frictional Contact Problems

Simulating frictional contact is a challenging computational task and there exist a variety of techniques to do so. One such technique, the staggered projections algorithm, requires the solution of two convex quadratic program (QP) subproblems at each iteration. We introduce a method, SCHURPA, which employs a primal-dual active-set strategy to efficiently solve these QPs based […]

CUDA

Feb, 22

Investigating performance variations of an optimized GPU-ported granulometry algorithm

In this article, we present an optimized GPU implementation of a granulometry algorithm which is used a lot in the study of material domain. The main contribution to this algorithm is the binarization of the input data which increases throughput while reducing data allocated memory space. Also, the optimized GPU implementation brings an order of […]

CUDA

Feb, 22

GPU-based Motion Planning under Uncertainties using POMDP

We present a novel GPU-based parallel algorithm to solve continuous-state POMDP problems. We choose the MCVI (Monte Carlo Value Iteration) method as our base algorithm [1], and parallelize this algorithm using multi-level parallel formulation of MCVI. For each parallel level, we propose efficient algorithms to effectively utilize the massive data parallelism of GPUs. To obtain […]

CUDA

Feb, 21

Scheduling a Parallel Sparse Direct Solver to Multiple GPUs

We present a sparse direct solver using multilevel task scheduling on a modern heterogeneous compute node consisting of a multi-core host processor and multiple GPU accelerators. Our direct solver is based on the multifrontal method, which is characterized by exploiting dense subproblems (fronts) related in an assembly tree. Critical to high performance of the solver […]

CUDA

Feb, 21

Chrono: a parallel multi-physics library for rigid-body, flexible-body, and fluid dynamics

The last decade witnessed a manifest shift in the microprocessor industry towards chip designs that promote parallel computing. Until recently the privilege of a select group of large research centers, Teraflop computing is becoming a commodity owing to inexpensive GPU cards and multi to many-core x86 processors. This paradigm shift towards large scale parallel computing […]

CUDA

Feb, 21

Computation of Air-Vortices Based on GPU Technology: Optimizing and Parallelizing a Model for Wake-Vortex Prediction Using OpenCL

This thesis details the refinement and numerical solution of a preexisting model for predicting the strengths and positions of so-called wake-vortices that are generated from the lift of heavy aircraft. The ultimate objective is to implement a numerical scheme for the model that is fast enough to allow for probabilistic methods, such as Monte Carlosimulations, […]

OpenCL

Feb, 21

GamePipe: A Virtualized Cloud Platform Design and Performance Evaluation

Cloud gaming provides game-on-demand (GoD) services over the Internet cloud. The goal is to achieve faster response time and higher QoS. The video game is rendered remotely on the game cloud and decoded on thin client devices such as tablet computer or smartphone. We design a game cloud with a virtualized cluster of CPU/GPU servers […]

CUDA

Feb, 21

Ray Tracing on GPUs

The ray tracing method aims for producing realistic and high-quality images of a scene described by geometric primitives such as triangles, spheres, etc. The basic idea is quiet simple and allows for straight forward implementations of this technique on the computer. At its core is a set of rays, each of which corresponding to one […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Adaptive Hardware-accelerated Terrain Tessellation

Parallel Computer Vision: Person Data Extraction

Performance Upper Bound Analysis and Optimization of SGEMM on Fermi and Kepler GPUs

An efficient scheduling scheme using estimated execution time for heterogeneous computing systems

A Parallel Active-Set Method for Solving Frictional Contact Problems

Investigating performance variations of an optimized GPU-ported granulometry algorithm

GPU-based Motion Planning under Uncertainties using POMDP

Scheduling a Parallel Sparse Direct Solver to Multiple GPUs

Chrono: a parallel multi-physics library for rigid-body, flexible-body, and fluid dynamics

Computation of Air-Vortices Based on GPU Technology: Optimizing and Parallelizing a Model for Wake-Vortex Prediction Using OpenCL

GamePipe: A Virtualized Cloud Platform Design and Performance Evaluation

Ray Tracing on GPUs

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)