high performance computing on graphics processing units: hgpu.org

Posts

Sep, 13

FuzzyGPU: a fuzzy arithmetic library for GPU

Data are traditionally represented using native format such as integer or floating-point numbers in various flavor. However, some applications rely on more complex representation format. This is the case when uncertainty needs to be apprehended. Fuzzy arithmetic is one of the major tools to address this problem, but the execution time of basic operations such […]

CUDA

Sep, 13

Increasing GPU Throughput using Kernel Interleaved Thread Block Scheduling

The number of active threads required to achieve peak application throughput on graphics processing units (GPUs) depends largely on the ratio of time spent on computation to the time spent accessing data from memory. While compute-intensive applications can achieve peak throughput with a low number of threads, memory-intensive applications might not achieve good throughput even […]

CUDA

Sep, 13

An Interface for Halo Exchange Pattern

Halo exchange patterns are very common in scientific computing, since the solution of PDEs often requires communication between neighbor points. Although this is a common pattern, implementations are often made by programmers from scratch, with an accompanying feeling of "reinventing the wheel". In this paper we describe GCL, a C++ generic library that implements a […]

CUDA

Sep, 13

Exploring Multiple Dimensions of Parallelism in Junction Tree Message Passing

Belief propagation over junction trees is known to be computationally challenging in the general case. One way of addressing this computational challenge is to use node-level parallel computing, and parallelize the computation associated with each separator potential table cell. However, this approach is not efficient for junction trees that mainly contain small separators. In this […]

CUDA

Sep, 13

Recent progress and challenges in exploiting graphics processors in computational fluid dynamics

The progress made in accelerating simulations of fluid flow using GPUs, and the challenges that remain, are surveyed. The review first provides an introduction to GPU computing and programming, and discusses various considerations for improved performance. Case studies comparing the performance of CPU- and GPU- based solvers for the Laplace and incompressible Navier-Stokes equations are […]

CUDA

Sep, 13

Accelerating moderately stiff chemical kinetics in reactive-flow simulations using GPUs

The chemical kinetics ODEs arising from operator-split reactive-flow simulations were solved on GPUs using explicit integration algorithms. Nonstiff chemical kinetics of a hydrogen oxidation mechanism (9 species and 38 irreversible reactions) were computed using the explicit fifth-order Runge-Kutta-Cash-Karp method, and the GPU-accelerated version performed faster than single- and six-core CPU versions by factors of 126 […]

CUDA

Sep, 13

A massively parallel program to solve the phase field formulation for crack propagation

Phase field models for fracture employ a continuous field variable to model cracks. Therefore, in contrast to discrete descriptions of fracture, numerical tracking of discontinuities in the displacement field are not required. This really reduces implementation complexity. In this paper, we discuss the use of a single graphical processing unit (GPU) to accelerate the solution […]

CUDA

Sep, 13

Simulation and modeling of physical broadcasts

The environment around us has many phenomena and has different behaviors according to different parameters, biological, chemical, physical, etc. To represent a simple and abstract reality of this environment we use a concept called environmental modeling. The environmental modeling deals with many environmental problems such as air pollution, diffusion of disease, animal behavior and so […]

CUDA

Sep, 13

Neptune: An astrophysical smooth particle hydrodynamics code for massively parallel computer architectures

Smooth particle hydrodynamics is an efficient method for modeling the dynamics of fluids. It is commonly used to simulate astrophysical processes such as binary mergers. We present a newly developed GPU accelerated smooth particle hydrodynamics code for astrophysical simulations. The code is named neptune after the Roman god of water. It is written in OpenMP […]

OpenCL

Sep, 13

Fast computation of computer-generated hologram using Xeon Phi coprocessor

We report fast computation of computer-generated holograms (CGHs) using Xeon Phi coprocessors, which have massively x86-based processors on one chip, recently released by Intel. CGHs can generate arbitrary light wavefronts, and therefore, are promising technology for many applications: for example, three-dimensional displays, diffractive optical elements, and the generation of arbitrary beams. CGHs incur enormous computational […]

CUDA

Sep, 11

Histogram Computations on GPUs Kernel using Global and Shared Memory Atomics

In this paper we implement histogram computations on a Graphics Processing Unit (GPU). Our Histogram computations is implemented using compute unified device architecture (CUDA) which is a minimal extension to C/C++. In this development Histogram computations, computed on GPU’s global memory as well as on shared memory. We also perform Histogram computations on CPU and […]

CUDA

Sep, 11

Evaluating the Viability of Application-Driven Cooperative CPU/GPU Fault Detection

Trends in high performance computing are bringing increased heterogeneity among the computational resources within a single machine. The heterogeneous CPU/GPU platforms, however, exacerbate resilience problems faced by current large-scale systems. How to design efficient resilience strategies is critical for the wider adoption of heterogeneous platforms for future exascale systems. The conventional resilience strategy for GPU […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

FuzzyGPU: a fuzzy arithmetic library for GPU

Increasing GPU Throughput using Kernel Interleaved Thread Block Scheduling

An Interface for Halo Exchange Pattern

Exploring Multiple Dimensions of Parallelism in Junction Tree Message Passing

Recent progress and challenges in exploiting graphics processors in computational fluid dynamics

Accelerating moderately stiff chemical kinetics in reactive-flow simulations using GPUs

A massively parallel program to solve the phase field formulation for crack propagation

Simulation and modeling of physical broadcasts

Neptune: An astrophysical smooth particle hydrodynamics code for massively parallel computer architectures

Fast computation of computer-generated hologram using Xeon Phi coprocessor

Histogram Computations on GPUs Kernel using Global and Shared Memory Atomics

Evaluating the Viability of Application-Driven Cooperative CPU/GPU Fault Detection

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)