high performance computing on graphics processing units: hgpu.org

Posts

Nov, 27

A Data-Parallel Algorithmic Modelica Extension for Efficient Execution on Multi-Core Platforms

New multi-core CPU and GPU architectures promise high computational power at a low cost if suitable computational algorithms can be developed. However, parallel programming for such architectures is usually non-portable, low-level and error-prone. To make the computational power of new multi-core architectures more easily available to Modelica modelers, we have developed the ParModelica algorithmic language […]

OpenCL

Nov, 27

Hardware-Accelerated Raycasting: Towards an Effective Brain MRI Visualization

The rapid development in information technology has immensely contributed to the use of modern approaches for visualizing volumetric data. Consequently, medical volume visualization is increasingly attracting attention towards achieving an effective visualization algorithm for medical diagnosis and pre-treatment planning. Previously, research has been addressing implementation of algorithm that can visualize 2-D images into 3-D. Meanwhile, […]

CUDA

Nov, 26

Energy efficiency vs. performance of the numerical solution of PDEs: an application study on a low-power ARM-based cluster

Power consumption and energy efficiency are becoming critical aspects in the design and operation of large scale HPC facilities, and it is unanimously recognised that future exascale supercomputers will be strongly constrained by their power requirements. At current electricity costs, operating an HPC system over its lifetime can already be on par with the initial […]

Nov, 26

Stochastic Analysis of a Queue Length Model Using a Graphics Processing Unit

Mathematical modeling is an inevitable part of system analysis and design in science and engineering. When a parametric mathematical description is used, the issue of the parameter estimation accuracy arises. Models with uncertain parameter values can be evaluated using various methods and computer simulation is among the most popular in the engineering community. Nevertheless, an […]

CUDA

Nov, 26

A compiler toolkit for array-based languages targeting CPU/GPU hybrid systems

This paper presents a compiler toolkit that addresses two important emerging challenges: (1) effectively compiling dynamic array-based languages such as MATLAB, Python and R; and (2) effectively utilizing a wide range of rapidly evolving hybrid CPU/GPU architectures. The toolkit provides: a high-level IR specifically designed to express a wide range of arraybased computations and indexing […]

OpenCL

Nov, 26

High Performance Radiation Transport Simulations: Preparing for TITAN

In this paper we describe the Denovo code system. Denovo solves the six-dimensional, steady-state, linear Boltzmann transport equation, of central importance to nuclear technology applications such as reactor core analysis (neutronics), radiation shielding, nuclear forensics and radiation detection. The code features multiple spatial differencing schemes, state-of-the-art linear solvers, the Koch-Baker-Alcouffe (KBA) parallel-wavefront sweep algorithm for […]

CUDA

Nov, 26

A Customized 3D GPU Poisson Solver for Free BCs

A 3-dimensional GPU Poisson solver is developed for all possible combinations of free and periodic boundary conditions along the three directions. It is benchmarked for various grid sizes and different BCs and a significant performance gain is observed for problems including one or more free BCs. The GPU Poisson solver is also benchmarked against two […]

CUDA

Nov, 25

PyFAI, a versatile library for azimuthal regrouping

2D area detectors like ccd or pixel detectors have become popular in the last 15 years for diffraction experiments (e.g. for waxs, saxs, single crystal and powder diffraction (xrpd)). These detectors have a large sensitive area of millions of pixels with high spatial resolution. The software package pyFAI has been designed to reduce saxs, waxs […]

OpenCL

Nov, 25

Electromagnetic transient simulation of large-scale electrical power networks using graphics processing units

In this paper electromagnetic transient (EMT) simulation of large scale power systems using graphics processing unit (GPU) based computing is demonstrated. As the size of power system networks increases, the simulation time using conventional central processing units (CPUs) based simulation increases drastically. This paper proposes a hybrid CPU-GPU environment for fast large scale power systems […]

CUDA

Nov, 25

Acceleration of Hardware Testing and Validation Algorithms using Graphics Processing Units

With the advances of very large scale integration (VLSI) technology, the feature size has been shrinking steadily together with the increase in the design complexity of logic circuits. As a result, the efforts taken for designing, testing, and debugging digital systems have increased tremendously. Although the electronic design automation (EDA) algorithms have been studied extensively […]

CUDA

Nov, 25

Characterization and Performance Analysis for 3D Benchmarks

The change in processor architectures and 3D benchmarks makes performance characterization important for every processor and 3D application generation. Recent 3D applications require large amount of data to be processed by the GPU and the CPU. This leads to the importance in analyzing processor performance for different architectures and benchmarks so that benchmarks and processors […]

Nov, 25

Location-based Matching in Publish/Subscribe Revisited

Event processing is gaining rising interest in industry and in academia. The common application pattern is that event processing agents publish events while other agents subscribe to events of interest. Extensive research has been devoted to developing efficient and scalable algorithms to match events with subscribers’ interests. The predominant abstraction used in this context is […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

A Data-Parallel Algorithmic Modelica Extension for Efficient Execution on Multi-Core Platforms

Hardware-Accelerated Raycasting: Towards an Effective Brain MRI Visualization

Energy efficiency vs. performance of the numerical solution of PDEs: an application study on a low-power ARM-based cluster

Stochastic Analysis of a Queue Length Model Using a Graphics Processing Unit

A compiler toolkit for array-based languages targeting CPU/GPU hybrid systems

High Performance Radiation Transport Simulations: Preparing for TITAN

A Customized 3D GPU Poisson Solver for Free BCs

PyFAI, a versatile library for azimuthal regrouping

Electromagnetic transient simulation of large-scale electrical power networks using graphics processing units

Acceleration of Hardware Testing and Validation Algorithms using Graphics Processing Units

Characterization and Performance Analysis for 3D Benchmarks

Location-based Matching in Publish/Subscribe Revisited

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)