high performance computing on graphics processing units: hgpu.org

Posts

Nov, 20

SBLOCK: A Framework for Efficient Stencil-Based PDE Solvers on Multi-core Platforms

We present a new software framework for the implementation of applications that use stencil computations on block-structured grids to solve partial differential equations. A key feature of the framework is the extensive use of automatic source code generation which is used to achieve high performance on a range of leading multi-core processors. Results are presented […]

CUDA

Nov, 20

A GPGPU Transparent Virtualization Component for High Performance Computing Clouds

The GPU Virtualization Service (gVirtuS) presented in this work tries to fill the gap between in-house hosted computing clusters, equipped with GPGPUs devices, and pay-for-use high performance virtual clusters deployed via public or private computing clouds. gVirtuS allows an instanced virtual machine to access GPGPUs in a transparent and hypervisor independent way, with an overhead […]

Nov, 19

Active Structured Learning for High-Speed Object Detection

High-speed smooth and accurate visual tracking of objects in arbitrary, unstructured environments is essential for robotics and human motion analysis. However, building a system that can adapt to arbitrary objects and a wide range of lighting conditions is a challenging problem, especially if hard real-time constraints apply like in robotics scenarios. In this work, we […]

CUDA

Nov, 19

PantaRay: fast ray-traced occlusion caching of massive scenes

We describe the architecture of a novel system for precomputing sparse directional occlusion caches. These caches are used for accelerating a fast cinematic lighting pipeline that works in the spherical harmonics domain. The system was used as a primary lighting technology in the movie Avatar, and is able to efficiently handle massive scenes of unprecedented […]

Nov, 19

Parallel option pricing with Fourier space time-stepping method on graphics processing units

With the evolution of graphics processing units (GPUs) into powerful and cost-efficient computing architectures, their range of application has expanded tremendously, especially in the area of computational finance. Current research in the area, however, is limited both in terms of the type of options priced and the complexity of stock price models. This paper presents […]

Nov, 19

Simulation of P systems with active membranes on CUDA

P systems or Membrane Systems provide a high-level computational modelling framework that combines the structure and dynamic aspects of biological systems in a relevant and understandable way. They are inherently parallel and non-deterministic computing devices. In this article, we discuss the motivation, design principles and key of the implementation of a simulator for the class […]

CUDA

Nov, 19

Parallel hybrid metaheuristics for the flexible job shop problem

A parallel approach to flexible job shop scheduling problem is presented in this paper. We propose two double-level parallel metaheuristic algorithms based on the new method of the neighborhood determination. Algorithms proposed here include two major modules: the machine selection module refer to executed sequentially, and the operation scheduling module executed in parallel. On each […]

Nov, 19

Real time ultrasound image denoising

Image denoising is the process of removing the noise that perturbs image analysis methods. In some applications like segmentation or registration, denoising is intended to smooth homogeneous areas while preserving the contours. In many applications like video analysis, visual servoing or image-guided surgical interventions, real-time denoising is required. This paper presents a method for real-time […]

CUDA

Nov, 19

View-dependent exploration of massive volumetric models on large-scale light field displays

We report on a light-field display based virtual environment enabling multiple naked-eye users to perceive detailed multi-gigavoxel volumetric models as floating in space, responsive to their actions, and delivering different information in different areas of the workspace. Our contributions include a set of specialized interactive illustrative techniques able to provide different contextual information in different […]

CUDA

•

OpenGL

Nov, 19

Accelerating numerical solution of stochastic differential equations with CUDA

Numerical integration of stochastic differential equations is commonly used in many branches of science. In this paper we present how to accelerate this kind of numerical calculations with popular NVIDIA Graphics Processing Units using the CUDA programming environment. We address general aspects of numerical programming on stream processors and illustrate them by two examples: the […]

CUDA

Nov, 19

Using Graphics Processors to Facilitate Explicit Digital Electrochemical Simulation: Theory of Elliptical Disc Electrodes

The use of graphics processors under the heading GPGPU (General-Purpose computation on GPUs (Graphics Processing Units)) promises a computational advance which may greatly facilitate the use of explicit digital simulation for non-trivial problems. This paper illustrates the use of GPGPU for the simulation of mass transport processes at elliptically shaped electrodes and for deformed microelectrodes. […]

Nov, 19

An optimised radial basis function algorithm for fast non-rigid registration of medical images

The registration of multi-modal medical image data is important in the fields of image guided surgery and computer aided medical diagnosis. Registration accuracy is of utmost importance in both fields, however in the former, the speed of registration is equally important. In this paper, we present a point-based “fast” non-rigid registration algorithm which exhibits significant […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

SBLOCK: A Framework for Efficient Stencil-Based PDE Solvers on Multi-core Platforms

A GPGPU Transparent Virtualization Component for High Performance Computing Clouds

Active Structured Learning for High-Speed Object Detection

PantaRay: fast ray-traced occlusion caching of massive scenes

Parallel option pricing with Fourier space time-stepping method on graphics processing units

Simulation of P systems with active membranes on CUDA

Parallel hybrid metaheuristics for the flexible job shop problem

Real time ultrasound image denoising

View-dependent exploration of massive volumetric models on large-scale light field displays

Accelerating numerical solution of stochastic differential equations with CUDA

Using Graphics Processors to Facilitate Explicit Digital Electrochemical Simulation: Theory of Elliptical Disc Electrodes

An optimised radial basis function algorithm for fast non-rigid registration of medical images

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)