high performance computing on graphics processing units: hgpu.org

Posts

Oct, 11

Mapping dynamic programming algorithms on graphics processing units

Alignment is the fundamental operation used to compare biological sequences. It also serves to identify regions of similarity that are eventually consequences of structural, functional, or evolutionary relationships. Today, the processing of sequences from large DNA or protein databases is a big challenge. Graphics Processing Units (GPUs) are based on a highly parallel, many-core streaming […]

CUDA

Oct, 11

Interactive Simulations with Navier-Stokes Equations on many-core Architectures

Navier-Stokes Equations are a mathematical model to describe the behaviour of fluids. They have proven to represent real fluid flows quite well and are base for many fluid simulations. In order to exploit the performance provided by modern many-core systems, fluid simulation algorithms must be able to efficiently solve the Navier-Stokes Equations in parallel. The […]

OpenCL

•

OpenGL

Oct, 11

Monte Carlo Path Tracing with OpenCL

We introduce an interactive Monte Carlo path tracer that uses the OpenCL framework. A path tracer draws a photo-realistic image of a 3D scene by simulating physical effects of light. Interactivity enables the user to move around the scene in real time, while OpenCL makes it possible to run the same code on either CPU […]

OpenCL

Oct, 11

Performance Improvement of Multichannel Audio by Graphics Processing Units

Multichannel acoustic signal processing has undergone major development in recent years due to the increased complexity of current audio processing applications. People want to collaborate through communication with the feeling of being together and sharing the same environment, what is considered as Immersive Audio Schemes. In this phenomenon, several acoustic effects are involved: 3D spatial […]

CUDA

•

OpenCL

Oct, 11

Leo: A Profile-Driven Dynamic Optimization Framework for GPU Applications

Parallel architectures like GPUs are a tantalizing compute fabric for performance-hungry developers. While GPUs enable order-of-magnitude performance increases in many data-parallel application domains, writing efficient codes that can actually manifest those increases is a non-trivial endeavor, typically requiring developers to exercise specialized architectural features exposed directly in the programming model. Achieving good performance on GPUs […]

CUDA

Oct, 10

Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM

Real-time dense computer vision and SLAM offer great potential for a new level of scene modelling, tracking and real environmental interaction for many types of robot, but their high computational requirements mean that use on mass market embedded platforms is challenging. Meanwhile, trends in low-cost, low-power processing are towards massive parallelism and heterogeneity, making it […]

CUDA

•

OpenCL

Oct, 10

Code Refinement of Stencil Codes

A straightforward implementation of an algorithm in a general-purpose programming language does usually not deliver peak performance: Compilers often fail to automatically tune the code for certain hardware peculiarities like memory hierarchy or vector execution units. Manually tuning the code is firstly error-prone as well as time-consuming and secondly taints the code by exposing those […]

CUDA

Oct, 10

Parallel implementation of linear repetitive processes identification using subspace algorithms

This paper presents a new parallel approach to identification of linear repetitive processes based on subspace algorithms. Parallel realizations of these algorithms are tested on various graphic cards that use NVIDIA CUDA technology. The paper describes implementation of subspace identification algorithms and their parallel speedup, efficiency, throughput, and delay. The parallel approach to the identification […]

CUDA

Oct, 10

Accelerating Protein Coordinate Conversion using GPUs

For modeling proteins in conformational states, two methods of representation are used: internal coordinates and Cartesian coordinates. Each of these representations contain a large amount of structural and simulation information. Different processing steps require one or the other representation. Our goal is to rapidly translate between these coordinate spaces so that a scientist can choose […]

CUDA

Oct, 10

FDTD on Distributed Heterogeneous Multi-GPU Systems

Finite-Difference Time-Domain (FDTD) is a popular technique for modeling computational electrodynamics, and is used within many research areas, such as the development of antennas, ultrasound imaging, and seismic wave propagation. Simulating large domains can however be very compute and memory demanding, which has motivated the use of cluster computing, and lately also the use of […]

CUDA

Oct, 8

cuDNN: Efficient Primitives for Deep Learning

We present a library that provides optimized implementations for deep learning primitives. Deep learning workloads are computationally intensive, and optimizing the kernels of deep learning workloads is difficult and time-consuming. As parallel architectures evolve, kernels must be reoptimized for new processors, which makes maintaining codebases difficult over time. Similar issues have long been addressed in […]

CUDA

Oct, 8

Movement Tracking in Terrain Conditions Accelerated with CUDA

The paper presents a solution to the problem of movement tracking in images acquired from video cameras monitoring outside terrain. The solution is resistant to such adverse factors as: leaves fluttering, grass waving, smoke or fog, movement of clouds etc. The presented solution is based on well known image processing methods, nevertheless the key was […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Mapping dynamic programming algorithms on graphics processing units

Interactive Simulations with Navier-Stokes Equations on many-core Architectures

Monte Carlo Path Tracing with OpenCL

Performance Improvement of Multichannel Audio by Graphics Processing Units

Leo: A Profile-Driven Dynamic Optimization Framework for GPU Applications

Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM

Code Refinement of Stencil Codes

Parallel implementation of linear repetitive processes identification using subspace algorithms

Accelerating Protein Coordinate Conversion using GPUs

FDTD on Distributed Heterogeneous Multi-GPU Systems

cuDNN: Efficient Primitives for Deep Learning

Movement Tracking in Terrain Conditions Accelerated with CUDA

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)