high performance computing on graphics processing units: hgpu.org

Posts

May, 9

GPU Sparse Matrix Multiplication with CUDA

Matrix multiplication is a commonly-used mathematical operation that has many practical applications. It is used to solve a number of problems in a wide variety of fields including science, engineering, and computer science. Given two matrices, A and B, and a resultant matrix C. The concept of density is used to describe the number of […]

CUDA

May, 9

OpenCL Implementation of Motion Estimation for Cloud Video Processing

With the raise of cloud computing infrastructures on one side and the increased accessibility of parallel computational devices on the other, such as GPUs and multi-core CPUs, parallel programming has recently gained a renewed interest. This is particularly true in the domain of video coding, where the complexity and time consumption of the algorithms tend […]

OpenCL

May, 9

libWater: Heterogeneous Distributed Computing Made Easy

Clusters of heterogeneous nodes composed of multi-core CPUs and GPUs are increasingly being used for High Performance Computing (HPC) due to the benefits in peak performance and energy efficiency. In order to fully harvest the computational capabilities of such architectures, application developers often employ a combination of different parallel programming paradigms (e.g. OpenCL, CUDA, MPI […]

CUDA

•

OpenCL

May, 9

Parallel Implementations of a Disparity Estimation Algorithm Based on a Proximal Splitting Method

The Parallel Proximal Algorithm (PPXA+) has been recently introduced as an efficient tool for solving convex optimization problems. It has proved particularly effective in the context of stereo vision, used as the methodological core of a novel disparity estimation technique. In this work, the main methodological issues limiting the efficient parallelization of this technique are […]

OpenCL

May, 8

Programming and Performance of Graphics Processors in Shock Waves Simulation by Finite Volume Method

In this paper, we mainly report on our experience and strategy in programming graphics processing units (GPUs) as fast parallel floating point coprocessors to accelerate the simulation of travelling shock waves of the 2-D Euler equation by the finite volume method. The GPU code is specialized in CUDA (Compute Unified Device Architecture) for which we […]

CUDA

May, 8

Non-symmetric magnetohydrostatic equilibria: a multigrid approach

AIMS. Neukirch and Rastatter (1999) re-formulate the linear magnetohydrostatic model (MHS) model of Low (1991) into a form that only requires the solution of two scalar elliptic partial differential equations. In this paper, we investigate an efficient numerical procedure for calculating MHS equilibria based on this representation. METHODS.The MHS equations are reduced to two scalar […]

CUDA

May, 8

Construction and Implementation of a Simple Agent-Based System on GPU-Architectures

Agent-based modelling and simulation is still an upcoming approach for microsimulation. But a large number of agents with advanced dynamics and interactions requires sophisticated algorithms and lots of computational effort. We try to implement a rather simple but special agent-based model model on GPU-architectures (graphics processing unit). This contribution presents the GPU implementations and and […]

CUDA

•

OpenGL

May, 8

Parallel Chen-Han (PCH) Algorithm for Discrete Geodesics

In many graphics applications, the computation of exact geodesic distance is very important. However, the high computational cost of the existing geodesic algorithms means that they are not practical for large-scale models or time-critical applications. To tackle this challenge, we propose the parallel Chen-Han (or PCH) algorithm, which extends the classic Chen-Han (CH) discrete geodesic […]

CUDA

May, 8

Somoclu: An Efficient Distributed Library for Self-Organizing Maps

Somoclu is a C++ tool for training self-organizing maps on large data sets using a high-performance cluster. It builds on MPI for distributing the workload across the nodes of the cluster. It is also able to boost training by using CUDA if graphics processing units are available. A sparse kernel is included, which is useful […]

CUDA

May, 7

Performance impact of dynamic parallelism on different clustering algorithms

In this paper, we aim to quantify the performance gains of dynamic parallelism. The newest version of CUDA, CUDA 5, introduces dynamic parallelism, which allows GPU threads to create new threads, without CPU intervention, and adapt to its data. This effectively eliminates the superfluous back and forth communication between the GPU and CPU through nested […]

CUDA

May, 7

Color and motion-based particle filter target tracking in a network of overlapping cameras with multi-threading and GPGPU

This paper describes an efficient implementation of multiple-target multiple-view tracking in video-surveillance sequences. It takes advantage of the capabilities of multiple core Central Processing Units (CPUs) and of graphical processing units under the Compute Unified Device Architecture (CUDA) framework. The principle of our algorithm is 1) in each video sequence, to perform tracking on all […]

CUDA

May, 7

Simulating the universe with GPU-accelerated supercomputers: n-body methods, tests, and examples

We demonstrate the acceleration obtained from using GPU/CPU hybrid clusters and supercomputers for N-body simulations of gravity based in part on the author’s new code development. Validation tests are shown for cosmological simulations and for galaxy simulations, along with their respective speedups compared to traditional simulations. Potential new applications for science enabled by this advance […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

GPU Sparse Matrix Multiplication with CUDA

OpenCL Implementation of Motion Estimation for Cloud Video Processing

libWater: Heterogeneous Distributed Computing Made Easy

Parallel Implementations of a Disparity Estimation Algorithm Based on a Proximal Splitting Method

Programming and Performance of Graphics Processors in Shock Waves Simulation by Finite Volume Method

Non-symmetric magnetohydrostatic equilibria: a multigrid approach

Construction and Implementation of a Simple Agent-Based System on GPU-Architectures

Parallel Chen-Han (PCH) Algorithm for Discrete Geodesics

Somoclu: An Efficient Distributed Library for Self-Organizing Maps

Performance impact of dynamic parallelism on different clustering algorithms

Color and motion-based particle filter target tracking in a network of overlapping cameras with multi-threading and GPGPU

Simulating the universe with GPU-accelerated supercomputers: n-body methods, tests, and examples

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)