high performance computing on graphics processing units: hgpu.org

Posts

Dec, 23

Toward GPU-accelerated Traffic Simulation and Its Real-Time Challenge

Traffic simulation is a growing domain of computational physics. Many life and industrial applications would benefit from traffic simulation to establish reliable transportation systems. A core challenge of this science research, however, is its unbounded scale of computation. This paper explores an advantage of using the graphics processing unit (GPU) for this computational challenge. We […]

CUDA

Dec, 23

Coulomb, Landau and Maximally Abelian Gauge Fixing in Lattice QCD with Multi-GPUs

A lattice gauge theory framework for simulations on graphic processing units (GPUs) using NVIDIA’s CUDA is presented. The code comprises template classes that take care of an optimal data pattern to ensure coalesced reading from device memory to achieve maximum performance. In this work we concentrate on applications for lattice gauge fixing in 3+1 dimensional […]

CUDA

Dec, 23

Implementation of Motion Estimation Based on Heterogeneous Parallel Computing System with OpenCL

Heterogeneous computing system increases the performance of parallel computing in many domain of general purpose computing with CPU, GPU and other accelerators. Open Computing Language (OpenCL) is the first open, royaltyfree standard for heterogenous computing on multi hardware platforms. In this paper, we propose a parallel Motion Estimation (ME) algorithm implemented using OpenCL and present […]

OpenCL

Dec, 21

Multicore and GPU Programming Models, Languages and Compilers Workshop, PLC 2013

Co-located with 27th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2013). his workshop aims to bring the programming community together to explore and discuss various options to make programming heterogeneous systems less challenging and more interesting. The workshop seeks to explore programming methodologies in the form of directive-based approaches, language extensions, novel tools and […]

Dec, 20

KFusion: Obtaining Modularity and Performance with Regards to General Purpose GPU Computing and Co-processors

Concurrency has recently come to the forefront of computing as multi-core processors become more and more common. General purpose graphics processing unit computing brings with them new language support for dealing with co-processor environments such as OpenCL and CUDA. Programming language support for multi-core architectures introduces a fundamentally new mechanism for modularity – a kernel. […]

OpenCL

Dec, 20

A Parallelized Algorithm for Hyperspectral Biometrics

The parallelized algorithm for hyperspectral biometrics uses the processing power of a GPU (Graphical Processing Unit) to compare hyperspectral images of people’s faces. The feature extraction algorithm first retrieves uniquely identifiable features from raw hyperspectral data from 64 bands and creates both a database and individual target files. Using these files, the comparison algorithm written […]

CUDA

Dec, 20

Track finding in ATLAS using GPUs

The reconstruction and simulation of collision events is a major task in modern HEP experiments involving several ten thousands of standard CPUs. On the other hand the graphics processors (GPUs) have become much more powerful and are by far outperforming the standard CPUs in terms of floating point operations due to their massive parallel approach. […]

CUDA

Dec, 20

GPU Environmental Delegation of Agent Perceptions for MABS

Considering the digital simulation of complex systems, General-Purpose Computing on Graphics Processing Units (GPGPU) is a relevant approach for addressing scalability issues. However, GPU programming is a very specific approach that strongly limits both the accessibility and the re-usability of the frameworks developed using GPGPU. This paper presents our approach for the integration of GPU […]

CUDA

Dec, 20

GPUs: An Oasis in the Supercomputing Desert

A novel metric is introduced to compare the supercomputing resources available to academic researchers on a national basis. Data from the supercomputing Top 500 and the top 500 universities in the Academic Ranking of World Universities (ARWU) are combined to form the proposed "500/500" score for a given country. Australia scores poorly in the 500/500 […]

Dec, 20

An Automatic Host and Device Memory Allocation Method for OpenMPC

The CUDA programming model provides better abstraction for GPU programming. However, it is still hard to write programs with CUDA because both some specific techniques and knowledge about GPU architecture is required. Hence, many programming frameworks for CUDA have been developed. OpenMPC is one of them based on OpenMP. OpenMPC s an easy-to-write framework for […]

CUDA

Dec, 20

A Parallel Preconditioned Bi-Conjugate Gradient Stabilized Solver for the Poisson Problem

We present a parallel Preconditioned Bi-Conjugate Gradient Stabilized(BICGstab) solver for the Poisson problem. Given a real, nosymmetric and positive definite coefficient matrix, the parallized Preconditioned BICGstab – solver is able to find a solution for that system by exploiting the massive compute power of todays GPUs. Comparing sequential CPU implementations and that algorithm.we achieve a […]

CUDA

Dec, 20

IceCubes GPGPU’s cluster for extensive MC production

GPGPU computing offers extraordinary increases in pure processing power for parallelizable applications. In IceCube we use GPUs for ray-tracing of cherenkov photons in the Antarctic ice as part of detector simulation. We report on how we implemented the mixed simulation production chain to include the processing on the GPGPU cluster for the IceCube Monte-Carlo production. […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Toward GPU-accelerated Traffic Simulation and Its Real-Time Challenge

Coulomb, Landau and Maximally Abelian Gauge Fixing in Lattice QCD with Multi-GPUs

Implementation of Motion Estimation Based on Heterogeneous Parallel Computing System with OpenCL

Multicore and GPU Programming Models, Languages and Compilers Workshop, PLC 2013

KFusion: Obtaining Modularity and Performance with Regards to General Purpose GPU Computing and Co-processors

A Parallelized Algorithm for Hyperspectral Biometrics

Track finding in ATLAS using GPUs

GPU Environmental Delegation of Agent Perceptions for MABS

GPUs: An Oasis in the Supercomputing Desert

An Automatic Host and Device Memory Allocation Method for OpenMPC

A Parallel Preconditioned Bi-Conjugate Gradient Stabilized Solver for the Poisson Problem

IceCubes GPGPU’s cluster for extensive MC production

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)