high performance computing on graphics processing units: hgpu.org

Posts

May, 7

Cross-Platform OpenCL Code and Performance Portability for CPU and GPU Architectures Investigated with a Climate and Weather Physics Model

Current multi- and many-core computing typically involves multi-core Central Processing Units (CPU) and many-core Graphical Processing Units (GPU) whose architectures are distinctly different. To keep longevity of application codes, it is highly desirable to have a programming paradigm to support these current and future architectures. Open Computing Language (OpenCL) is created to address this problem. […]

OpenCL

May, 7

Parallelization of calculations using GPU in optimization approach for macromodels construction

Construction of mathematical models for nonlinear dynamical systems using optimization requires significant computation efforts to solve the optimization task. The most CPU time is required by optimization procedure for goal function calculations, which is repeated many times for different model parameters. This allows to use processors with SIMD architecture of calculation parallelization. The effectiveness of […]

CUDA

May, 7

Implementation of digital down converter in GPU

Giant Metrewave Radio Telescope is undergoing an upgradation. GMRT is mainly used for pulsar, continuum and spectral line observations. Spectral Line observations require more resolution which can be achieved by narrowband mode. Thus to utilize the GMRT correlator resources efficiently and to speed up the further signal processing, Digital Down Converter is of great use. […]

CUDA

May, 7

Implementation and Optimization of Image Processing Algorithms on Embedded GPU

In this paper, we analyze the key factors underlying the implementation, evaluation, and optimization of image processing and computer vision algorithms on embedded GPU using OpenGL ES 2.0 shader model. First, we present the characteristics of the embedded GPU and its inherent advantage when compared to embedded CPU. Additionally, we propose techniques to achieve increased […]

OpenGL

May, 6

Comparison of OpenMP and OpenCL Parallel Processing Technologies

This paper presents a comparison of OpenMP and OpenCL based on the parallel implementation of algorithms from various fields of computer applications. The focus of our study is on the performance of benchmark comparing OpenMP and OpenCL. We observed that OpenCL programming model is a good option for mapping threads on different processing cores. Balancing […]

OpenCL

May, 6

Transparent Accelerator Migration in a Virtualized GPU Environment

This paper presents a framework to support transparent, live migration of virtual GPU accelerators in a virtualized execution environment. Migration is a critical capability in such environments because it provides support for fault tolerance, ondemand system maintenance, resource management, and load balancing in the mapping of virtual to physical GPUs. Techniques to increase responsiveness and […]

OpenCL

May, 6

Effects of Compiler Optimizations in OpenMP to CUDA Translation

One thrust of the OpenMP standard development focuses on support for accelerators. An important question is whether or not OpenMP extensions are needed, and how much performance difference they would make. The same question is relevant for related efforts in support of accelerators, such as OpenACC. The present paper pursues this question. We analyze the […]

CUDA

May, 6

Design of a Hybrid Memory System for General-Purpose Graphics Processing Units

Addressing a limited power budget is a prerequisite for maintaining the growth of computer system performance into and beyond the exascale. Two technologies with the potential to help solve this problem include general-purpose programming on graphics processors and fast non-volatile memories. Combining these technologies could yield devices capable of extreme-scale computation at lower power. The […]

CUDA

May, 6

One Stone Two Birds: Synchronization Relaxation and Redundancy Removal in GPU-CPU Translation

As an approach to promoting whole-system synergy on a heterogeneous computing system, compilation of fine-grained SPMD-threaded code (e.g., GPU CUDA code) for multicore CPU has drawn some recent attentions. This paper concentrates on two important sources of inefficiency that limit existing translators. The first is overly strong synchronizations; the second is thread-level partially redundant computations. […]

CUDA

May, 4

Efficient Intranode Communication in GPU-Accelerated Systems

Current implementations of MPI are unaware of accelerator memory (i.e., GPU device memory) and require programmers to explicitly move data between memory spaces. This approach is inefficient, especially for intranode communication where it can result in several extra copy operations. In this work, we integrate GPU-awareness into a popular MPI runtime system and develop techniques […]

CUDA

May, 4

Heterogeneous Task Scheduling for Accelerated OpenMP

Heterogeneous systems with CPUs and computational accelerators such as GPUs, FPGAs or the upcoming Intel MIC are becoming mainstream. In these systems, peak performance includes the performance of not just the CPUs but also all available accelerators. In spite of this fact, the majority of programming models for heterogeneous computing focus on only one of […]

CUDA

May, 4

Generalizing the Utility of GPUs in Large-Scale Heterogeneous Computing Systems

Graphics processing units (GPUs) have been widely used as accelerators in large-scale heterogeneous computing systems. However, current programming models can only support the utilization of local GPUs. When using non-local GPUs, programmers need to explicitly call API functions for data communication across computing nodes. As such, programming GPUs in large-scale computing systems is more challenging […]

OpenCL

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Cross-Platform OpenCL Code and Performance Portability for CPU and GPU Architectures Investigated with a Climate and Weather Physics Model

Parallelization of calculations using GPU in optimization approach for macromodels construction

Implementation of digital down converter in GPU

Implementation and Optimization of Image Processing Algorithms on Embedded GPU

Comparison of OpenMP and OpenCL Parallel Processing Technologies

Transparent Accelerator Migration in a Virtualized GPU Environment

Effects of Compiler Optimizations in OpenMP to CUDA Translation

Design of a Hybrid Memory System for General-Purpose Graphics Processing Units

One Stone Two Birds: Synchronization Relaxation and Redundancy Removal in GPU-CPU Translation

Efficient Intranode Communication in GPU-Accelerated Systems

Heterogeneous Task Scheduling for Accelerated OpenMP

Generalizing the Utility of GPUs in Large-Scale Heterogeneous Computing Systems

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)