high performance computing on graphics processing units: hgpu.org

Posts

Jan, 17

Accurate multi-view reconstruction using robust binocular stereo and surface meshing

This paper presents a new algorithm for multi-view reconstruction that demonstrates both accuracy and efficiency. Our method is based on robust binocular stereo matching, followed by adaptive point-based filtering of the merged point clouds, and efficient, high-quality mesh generation. All aspects of our method are designed to be highly scalable with the number of views. […]

Jan, 17

GPU-based Collision Detection for Deformable Parameterized Surfaces

Based on the potential of current programmable GPUs, recently several approaches were developed that use the GPU to calculate deformations of surfaces like the folding of cloth or to convert higher level geometry to renderable primitives like NURBS or subdivision surfaces. These algorithms are realized as a per-frame operation and take advantage of the parallel […]

Jan, 17

Visual Simulation of Flow

We have adopted a numerical method from computational fluid dynamics, the Lattice Boltzmann Method (LBM), for real-time simulation and visualization of flow and amorphous phenomena, such as clouds, smoke, fire, haze, dust, radioactive plumes, and air-borne biological or chemical agents. Unlike other approaches, LBM discretizes the micro-physics of local interactions and can handle very complex […]

Jan, 17

Solving dense linear systems on platforms with multiple hardware accelerators

In a previous PPoPP paper we showed how the FLAME methodology, combined with the SuperMatrix runtime system, yields a simple yet powerful solution for programming dense linear algebra operations on multicore platforms. In this paper we provide further evidence that this approach solves the programmability problem for this domain by targeting a more complex architecture, […]

CUDA

Jan, 17

CheCUDA: A Checkpoint/Restart Tool for CUDA Applications

In this paper, a tool named CheCUDA is designed to checkpoint CUDA applications that use GPUs as accelerators. As existing checkpoint/restart implementations do not support checkpointing the GPU status, CheCUDA hooks a part of basic CUDA driver API calls in order to record the status changes on the main memory. At checkpointing, CheCUDA stores the […]

CUDA

Jan, 17

Evaluating GPUs for network packet signature matching

Modern network devices employ deep packet inspection to enable sophisticated services such as intrusion detection, traffic shaping, and load balancing. At the heart of such services is a signature matching engine that must match packet payloads to multiple signatures at line rates. However, the recent transition to complex regular-expression based signatures coupled with ever-increasing network […]

CUDA

Jan, 17

Acceleration of Acoustic Emission Signal Processing Algorithms using CUDA Standard

Offline processing of acoustic emission (AE) signal waveforms recorded during a long-term AE monitoring session is a challenging problem in AE testing area. This is due to the fact that today’s AE systems can work with up to hundreds of channels and are able to process tens of thousands of AE events per second. The […]

CUDA

Jan, 17

Real-Time Non-rigid Registration of Medical Images on a Cooperative Parallel Architecture

Unacceptable execution time of Non-rigid registration (NRR) often presents a major obstacle to its routine clinical use. Parallel computing is an effective way to accelerate NRR. However, development of efficient parallel NRR codes is a very challenging task. One desirable approach is to map the existing sequential algorithm to the parallel architecture to gain speedup […]

CUDA

Jan, 17

Parallelization Strategies for Ant Colony Optimisation on GPUs

Ant Colony Optimisation (ACO) is an effective population-based meta-heuristic for the solution of a wide variety of problems. As a population-based algorithm, its computation is intrinsically massively parallel, and it is there- fore theoretically well-suited for implementation on Graphics Processing Units (GPUs). The ACO algorithm comprises two main stages: Tour construction and Pheromone update. The […]

CUDA

Jan, 16

Interactive visual analysis of contrast-enhanced ultrasound data based on local neighborhood statistics

Contrast-enhanced ultrasound (CEUS) has recently become an important technology for lesion detection and characterization in cancer diagnosis. CEUS is used to investigate the perfusion kinetics in tissue over time, which relates to tissue vascularization. In this paper we present a pipeline that enables interactive visual exploration and semi-automatic segmentation and classification of CEUS data. For […]

OpenCL

Jan, 16

An OpenCL framework for heterogeneous multicores with local memory

In this paper, we present the design and implementation of an Open Computing Language (OpenCL) framework that targets heterogeneous accelerator multicore architectures with local memory. The architecture consists of a general-purpose processor core and multiple accelerator cores that typically do not have any cache. Each accelerator core, instead, has a small internal local memory. Our […]

OpenCL

Jan, 16

Using generalized ensemble simulations and Markov state models to identify conformational states

Part of understanding a molecule’s conformational dynamics is mapping out the dominant metastable, or long lived, states that it occupies. Once identified, the rates for transitioning between these states may then be determined in order to create a complete model of the system’s conformational dynamics. Here we describe the use of the MSMBuilder package (now […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Accurate multi-view reconstruction using robust binocular stereo and surface meshing

GPU-based Collision Detection for Deformable Parameterized Surfaces

Visual Simulation of Flow

Solving dense linear systems on platforms with multiple hardware accelerators

CheCUDA: A Checkpoint/Restart Tool for CUDA Applications

Evaluating GPUs for network packet signature matching

Acceleration of Acoustic Emission Signal Processing Algorithms using CUDA Standard

Real-Time Non-rigid Registration of Medical Images on a Cooperative Parallel Architecture

Parallelization Strategies for Ant Colony Optimisation on GPUs

Interactive visual analysis of contrast-enhanced ultrasound data based on local neighborhood statistics

An OpenCL framework for heterogeneous multicores with local memory

Using generalized ensemble simulations and Markov state models to identify conformational states

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)