high performance computing on graphics processing units: hgpu.org

Posts

Nov, 20

GPUs for fast pattern matching in the RICH of the NA62 experiment

In rare decays experiments an effective online selection is a fundamental part of the data acquisition system (DAQ) in order to reduce both the quantity of data written on tape and the bandwidth requirements for the DAQ system. A multilevel architecture is commonly used to achieve a higher reduction factor, exploiting dedicated custom hardware and […]

Nov, 20

Real-time adaptive fluid simulation with complex boundaries

In this paper, we present a new adaptive model for real-time fluid simulation with complex boundaries based on Smoothed Particle Hydrodynamics (SPH) framework. Firstly, we introduce an adaptive SPH framework that is based on our character field function composed of four factors: geometrical complexity, boundary condition, physical complexity, and complementary condition in terms of the […]

CUDA

Nov, 20

Patient-Specific Non-Linear Finite Element Modelling for Predicting Soft Organ Deformation in Real-Time; Application to Non-Rigid Neuroimage Registration

Long computation times of non-linear (i.e. accounting for geometric and material non-linearity) biomechanical models have been regarded as one of the key factors preventing application of such models in predicting organ deformation for image-guided surgery. This contribution presents real-time patient-specific computation of the deformation field within the brain for six cases of brain shift induced […]

Nov, 20

Visualization of LIDAR datasets using point-based rendering technique

Remote sensing technologies, such as LIDAR, rapidly evolve and produce large datasets. The computers used to visualize these data have limited resources, which prevent detailed and real-time visualization. An approach to real-time visualization of virtually unlimited LIDAR datasets, at full detail with a hierarchical and out-of-core approach to data management and a modern point-based rendering […]

Nov, 20

Skeleton and Shape Adjustment and Tracking in Multicamera Environments

In this paper we present a method for automatic body model adjustment and motion tracking in multicamera environments. We introduce a set of shape deformation parameters based on linear blend skinning, that allow a deformation related to the scaling of the distinct bones of the body model skeleton, and a deformation in the radial direction […]

Nov, 20

Regular Lattice and Small-World Spin Model Simulations Using CUDA and GPUs

Data-parallel accelerator devices such as Graphical Processing Units (GPUs) are providing dramatic performance improvements over even multi-core CPUs for lattice-oriented applications in computational physics. Models such as the Ising and Potts models continue to play a role in investigating phase transitions on small-world and scale-free graph structures. These models are particularly well-suited to the performance […]

CUDA

Nov, 20

An instruction-systolic programmable shader architecture for multi-threaded 3D graphics processing

In order to guarantee both performance and programmability demands in 3D graphics applications, vector and multithreaded SIMD architectures have been employed in recent graphics processing units. This paper introduces a novel instruction-systolic array architecture, which transfers an instruction stream in a pipelined fashion to efficiently share the expensive functional resources of a graphics processor. Specifically, […]

Nov, 20

A middleware for efficient stream processing in CUDA

This paper presents a middleware capable of out-of-order execution of kernels and data transfers for efficient stream processing in the compute unified device architecture (CUDA). Our middleware runs on the CUDA-compatible graphics processing unit (GPU). Using the middleware, application developers are allowed to easily overlap kernel computation with data transfer between the main memory and […]

CUDA

Nov, 20

Simulating a P system based efficient solution to SAT by using GPUs

P systems are inherently parallel and non-deterministic theoretical computing devices defined inside the field of Membrane Computing. Many P system simulators have been presented in this area, but they are inefficient since they can not handle the parallelism of these devices. Nowadays, we are witnessing the consolidation of the GPUs as a parallel framework to […]

Nov, 20

Simulation of one-layer shallow water systems on multicore and CUDA architectures

The numerical solution of shallow water systems is useful for several applications related to geophysical flows, but the big dimensions of the domains suggests the use of powerful accelerators to obtain numerical results in reasonable times. This paper addresses how to speed up the numerical solution of a first order well-balanced finite volume scheme for […]

CUDA

Nov, 20

Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort

Sort is a fundamental kernel used in many database operations. In-memory sorts are now feasible; sort performance is limited by compute flops and main memory bandwidth rather than I/O. In this paper, we present a competitive analysis of comparison and non-comparison based sorting algorithms on two modern architectures – the latest CPU and GPU architectures. […]

Nov, 20

SBLOCK: A Framework for Efficient Stencil-Based PDE Solvers on Multi-core Platforms

We present a new software framework for the implementation of applications that use stencil computations on block-structured grids to solve partial differential equations. A key feature of the framework is the extensive use of automatic source code generation which is used to achieve high performance on a range of leading multi-core processors. Results are presented […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

GPUs for fast pattern matching in the RICH of the NA62 experiment

Real-time adaptive fluid simulation with complex boundaries

Patient-Specific Non-Linear Finite Element Modelling for Predicting Soft Organ Deformation in Real-Time; Application to Non-Rigid Neuroimage Registration

Visualization of LIDAR datasets using point-based rendering technique

Skeleton and Shape Adjustment and Tracking in Multicamera Environments

Regular Lattice and Small-World Spin Model Simulations Using CUDA and GPUs

An instruction-systolic programmable shader architecture for multi-threaded 3D graphics processing

A middleware for efficient stream processing in CUDA

Simulating a P system based efficient solution to SAT by using GPUs

Simulation of one-layer shallow water systems on multicore and CUDA architectures

Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort

SBLOCK: A Framework for Efficient Stencil-Based PDE Solvers on Multi-core Platforms

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)