high performance computing on graphics processing units: hgpu.org

Posts

May, 25

GEMTC: GPU Enabled Many-Task Computing

Current software and hardware limitations prevent Many-Task Computing (MTC) workloads from leveraging hardware accelerators (NVIDIA GPUs, Intel Xeon Phi) boasting Many-Core Computing architectures. Some broad application classes that fit the MTC paradigm are workflows, MapReduce, high-throughput computing, and a subset of high-performance computing. MTC emphasizes using many computing resources over short periods of time to […]

CUDA

May, 25

JPEG-GPU:: a GPGPU Implementation of JPEG Core Coding Systems

JPEG is a commonly used method of lossy compression for digital photography (image). This work targets on accelerating JPEG’s compressor and decompressor with GPU. Though the final results are not promising, I would like to introduce the lessons I have learned in accelerating a general system with GPGPU.

CUDA

May, 23

Graphics Processing Unit (GPU) Implementation Methodology of AERMOD Model

Air pollution is one of the major problems the world is facing today. Air pollution is caused due to release of dangerous chemical substances such as carbon monoxide, CFC (Chlorofluorocarbon), carbon dioxide, hydro carbon, sulfur dioxide, etc. in to the atmosphere. These substances are produced by various anthropological activities such as usage of vehicles, factory […]

CUDA

May, 23

Composing multiple StarPU applications over heterogeneous machines: a supervised approach

Enabling HPC applications to perform efficiently when invoking multiple parallel libraries simultaneously is a great challenge. Even if a single runtime system is used underneath, scheduling tasks or threads coming from different libraries over the same set of hardware resources introduces many issues, such as resource oversubscription, undesirable cache flushes or memory bus contention. This […]

CUDA

May, 23

Sequential Consistency for Heterogeneous-Race-Free: Programmer-centric Memory Models for Heterogeneous Platforms

Hardware vendors now provide heterogeneous platforms in commodity markets (e.g., integrated CPUs and GPUs), and are promising an integrated, shared memory address space for such platforms in future iterations. Because not all threads in a heterogeneous platform can communicate with the same latency, vendors are proposing synchronization mechanisms that allow threads to communicate with a […]

CUDA

•

OpenCL

May, 23

Surface Reconstruction from Scattered Point via RBF Interpolation on GPU

In this paper we describe a parallel implicit method based on radial basis functions (RBF) for surface reconstruction. The applicability of RBF methods is hindered by its computational demand, that requires the solution of linear systems of size equal to the number of data points. Our reconstruction implementation relies on parallel scientific libraries and is […]

CUDA

May, 23

GPU Enhancement of the Trigger to Extend Physics Reach at the LHC

Significant new challenges are continuously confronting the High Energy Physics (HEP) experiments, in particular the two detectors at the Large Hadron Collider (LHC) at CERN, where nominal conditions deliver proton-proton collisions to the detectors at a rate of 40 MHz. This rate must be significantly reduced to comply with both the performance limitations of the […]

CUDA

May, 21

Evaluating the Performance of Legacy Applications on Emerging Parallel Architectures

The gap between a supercomputer’s theoretical maximum ("peak") floating-point performance and that actually achieved by applications has grown wider over time. Today, a typical scientific application achieves only 5-20% of any given machine’s peak processing capability, and this gap leaves room for significant improvements in execution times. This problem is most pronounced for modern "accelerator" […]

CUDA

•

OpenCL

May, 21

Implementing Continuous Integration Software in an Established Computational Chemistry Software Package

Continuous integration is the software engineering principle of rapid and automated development and testing. We identify several key points of continuous integration and demonstrate how they relate to the needs of computational science projects by discussing the implementation and relevance of these principles to AMBER, a large and widely used molecular dynamics software package. The […]

CUDA

May, 21

An Investigation of the Performance Portability of OpenCL

This paper reports on the development of an MPI/OpenCL implementation of LU, an application-level benchmark from the NAS Parallel Benchmark Suite. An account of the design decisions addressed during the development of this code is presented, demonstrating the importance of memory arrangement and work-item/work-group distribution strategies when applications are deployed on different device types. The […]

CUDA

•

OpenCL

May, 21

Super Earths and Dynamical Stability of Planetary Systems: First Parallel GPU Simulations Using GENGA

We report on the stability of hypothetical Super-Earths in the habitable zone of known multi-planetary systems. Most of them have not yet been studied in detail concerning the existence of additional low-mass planets. The new N-body code GENGA developed at the UZH allows us to perform numerous N-body simulations in parallel on GPUs. With this […]

CUDA

May, 21

3DES ECB Optimized for Massively Parallel CUDA GPU Architecture

Modern computers have graphics cards with much higher theoretical efficiency than conventional CPU. The paper presents application possibilities GPU CUDA acceleration for encryption of data using the new architecture tailored to the 3DES algorithm, characterized by increased security compared to the normal DES. The algorithm used in ECB mode (Electronic Codebook), in which 64-bit data […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

GEMTC: GPU Enabled Many-Task Computing

JPEG-GPU:: a GPGPU Implementation of JPEG Core Coding Systems

Graphics Processing Unit (GPU) Implementation Methodology of AERMOD Model

Composing multiple StarPU applications over heterogeneous machines: a supervised approach

Sequential Consistency for Heterogeneous-Race-Free: Programmer-centric Memory Models for Heterogeneous Platforms

Surface Reconstruction from Scattered Point via RBF Interpolation on GPU

GPU Enhancement of the Trigger to Extend Physics Reach at the LHC

Evaluating the Performance of Legacy Applications on Emerging Parallel Architectures

Implementing Continuous Integration Software in an Established Computational Chemistry Software Package

An Investigation of the Performance Portability of OpenCL

Super Earths and Dynamical Stability of Planetary Systems: First Parallel GPU Simulations Using GENGA

3DES ECB Optimized for Massively Parallel CUDA GPU Architecture

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)