high performance computing on graphics processing units: hgpu.org

Posts

May, 11

A GPU-based Parallel Fireworks Algorithm for Optimization

Swarm intelligence algorithms have been widely used to solve difficult real world problems in both academic and engineering domains. Thanks to the inherent parallelism, various parallelized swarm intelligence algorithms have been proposed to speed up the optimization process, especially on the massively parallel processing architecture GPUs. However, conventional swarm intelligence algorithms are usually not designed […]

CUDA

May, 11

An Implementation of the Discontinuous Galerkin Method on Graphics Processing Units

Computing highly-accurate approximate solutions to partial differential equations (PDEs) requires both a robust numerical method and a powerful machine. We present a parallel implementation of the discontinuous Galerkin (DG) method on graphics processing units (GPUs). In addition to being flexible and highly accurate, DG methods accommodate parallel architectures well, as their discontinuous nature produces entirely […]

CUDA

May, 11

Auto-tuning a LOFAR radio astronomy pipeline in JavaCL

Modern radio telescopes, such as the Low Frequency Array (LOFAR) in the north of the Netherlands, process the signal from the sky in software rather than expensive special purpose hardware, This gives the astronomers an unprecedented flexibility to perform a vast amount of various scientific experiments. However, designing the actual software that would give optimal […]

OpenCL

May, 9

Three-dimensional LBM simulations of buoyancy-driven flow using Graphics processing units

Three-dimensional simulations of buoyancy-driven flow of two immiscible liquids are performed using lattice Boltzmann method (LBM) implemented on a graphics processing unit (GPU). Graphics processing unit is a new paradigm for computing fluid flows and has become more popular in the recent years. It is a powerful and convenient to use. LBM, which is an […]

CUDA

May, 9

GPU Sparse Matrix Multiplication with CUDA

Matrix multiplication is a commonly-used mathematical operation that has many practical applications. It is used to solve a number of problems in a wide variety of fields including science, engineering, and computer science. Given two matrices, A and B, and a resultant matrix C. The concept of density is used to describe the number of […]

CUDA

May, 9

OpenCL Implementation of Motion Estimation for Cloud Video Processing

With the raise of cloud computing infrastructures on one side and the increased accessibility of parallel computational devices on the other, such as GPUs and multi-core CPUs, parallel programming has recently gained a renewed interest. This is particularly true in the domain of video coding, where the complexity and time consumption of the algorithms tend […]

OpenCL

May, 9

libWater: Heterogeneous Distributed Computing Made Easy

Clusters of heterogeneous nodes composed of multi-core CPUs and GPUs are increasingly being used for High Performance Computing (HPC) due to the benefits in peak performance and energy efficiency. In order to fully harvest the computational capabilities of such architectures, application developers often employ a combination of different parallel programming paradigms (e.g. OpenCL, CUDA, MPI […]

CUDA

•

OpenCL

May, 9

Parallel Implementations of a Disparity Estimation Algorithm Based on a Proximal Splitting Method

The Parallel Proximal Algorithm (PPXA+) has been recently introduced as an efficient tool for solving convex optimization problems. It has proved particularly effective in the context of stereo vision, used as the methodological core of a novel disparity estimation technique. In this work, the main methodological issues limiting the efficient parallelization of this technique are […]

OpenCL

May, 8

Programming and Performance of Graphics Processors in Shock Waves Simulation by Finite Volume Method

In this paper, we mainly report on our experience and strategy in programming graphics processing units (GPUs) as fast parallel floating point coprocessors to accelerate the simulation of travelling shock waves of the 2-D Euler equation by the finite volume method. The GPU code is specialized in CUDA (Compute Unified Device Architecture) for which we […]

CUDA

May, 8

Non-symmetric magnetohydrostatic equilibria: a multigrid approach

AIMS. Neukirch and Rastatter (1999) re-formulate the linear magnetohydrostatic model (MHS) model of Low (1991) into a form that only requires the solution of two scalar elliptic partial differential equations. In this paper, we investigate an efficient numerical procedure for calculating MHS equilibria based on this representation. METHODS.The MHS equations are reduced to two scalar […]

CUDA

May, 8

Construction and Implementation of a Simple Agent-Based System on GPU-Architectures

Agent-based modelling and simulation is still an upcoming approach for microsimulation. But a large number of agents with advanced dynamics and interactions requires sophisticated algorithms and lots of computational effort. We try to implement a rather simple but special agent-based model model on GPU-architectures (graphics processing unit). This contribution presents the GPU implementations and and […]

CUDA

•

OpenGL

May, 8

Parallel Chen-Han (PCH) Algorithm for Discrete Geodesics

In many graphics applications, the computation of exact geodesic distance is very important. However, the high computational cost of the existing geodesic algorithms means that they are not practical for large-scale models or time-critical applications. To tackle this challenge, we propose the parallel Chen-Han (or PCH) algorithm, which extends the classic Chen-Han (CH) discrete geodesic […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

A GPU-based Parallel Fireworks Algorithm for Optimization

An Implementation of the Discontinuous Galerkin Method on Graphics Processing Units

Auto-tuning a LOFAR radio astronomy pipeline in JavaCL

Three-dimensional LBM simulations of buoyancy-driven flow using Graphics processing units

GPU Sparse Matrix Multiplication with CUDA

OpenCL Implementation of Motion Estimation for Cloud Video Processing

libWater: Heterogeneous Distributed Computing Made Easy

Parallel Implementations of a Disparity Estimation Algorithm Based on a Proximal Splitting Method

Programming and Performance of Graphics Processors in Shock Waves Simulation by Finite Volume Method

Non-symmetric magnetohydrostatic equilibria: a multigrid approach

Construction and Implementation of a Simple Agent-Based System on GPU-Architectures

Parallel Chen-Han (PCH) Algorithm for Discrete Geodesics

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)