high performance computing on graphics processing units: hgpu.org

Posts

Aug, 31

Bitcoin and The Age of Bespoke Silicon

Recently, the Bitcoin cryptocurrency has been an international sensation. This paper tells the story of Bitcoin hardware: how a group of early-adopters self-organized and financed the creation of an entire new industry, leading to the development of machines, including ASICs, that had orders of magnitude better performance than what Dell, Intel, NVidia, AMD or Xilinx […]

OpenCL

Aug, 31

Particle Swarm Optimization of Model Parameters: Simulation of Deep Reactive Ion Etching by the Continuous Cellular Automaton

As a widespread form of Deep Reactive Ion Etching (DRIE), the Bosch process alternates etching and passivation cycles, typically leading to characteristic scalloping patterns on the sidewalls. Measurements of the etch depth per cycle l_d and undercut length per cycle l_u show a strong dependence of the undercut ratio l_u / l_d on the trench […]

CUDA

Aug, 31

Computing High Resolution Explicit Corridor Maps using Parallel Technologies

This work investigates the approximated construction of Explicit Corridor Maps (ECMs). An ECM is a type of Navigation Mesh: a geometrical structure describing the walkable space of an environment that is used to speed-up the path-finding and crowd simulation operations occurring in the environment. Additional geometrical routines that take advantage of the GPGPU model are […]

CUDA

•

OpenCL

Aug, 31

Accelerating Text Mining Workloads in a MapReduce-based Distributed GPU Environment

Scientific computations have been using GPU-enabled computers successfully, often relying on distributed nodes to overcome the limitations of device memory. Only a handful of text mining applications benefit from such infrastructure. Since the initial steps of text mining are typically data intensive, and the ease of deployment of algorithms is an important factor in developing […]

CUDA

Aug, 30

A Feedback Approach to Task Partitioning in Heterogeneous Architectures

Personal Computers of today are based on complex architectures often with multiple high performance computational units for various dedicated purposes. The General Purpose GPU is one such example where Graphic Processing Units are being used for more general purpose computing. In this paper, we target such architectures and focus on Load Balancing and Task Partitioning […]

OpenCL

Aug, 30

Real-Time GPU Path Tracing

In this paper, we present a simple, yet efficient implementation of the path tracing algorithm for GPUs. A reformulation of Russian Roulette is used to achieve high SIMT utilization, which leads to real-time performance in Kajiya’s classic scene, using a single GPU. We apply our scheme to larger scenes in the Brigade system, an experimental […]

CUDA

Aug, 30

Evolutionary Algorithm for Optimizing Parameters of GPGPU-based Image Segmentation

The use of digital microscopy allows diagnosis through automated quantitative and qualitative analysis of the digital images. Often to evaluate the samples, the first step is determining the number and location of cell nuclei. For this purpose, we have developed a GPGPU based data-parallel region growing algorithm that is equally as accurate as the already […]

CUDA

Aug, 30

Performance Portability Strategies for Computational Fluid Dynamics (CFD) Applications on HPC Systems

Achieving high computational performance on large-scale high performance computing (HPC) system demands optimizations to exploit hardware characteristics. Various optimizations and research strategies are implemented to improve performance with emphasis on single or multiple hardware characteristics. Among these approaches, the domain-specific approach involving domain expertise shows its high potential in achieving high performance and maintaining performance […]

CUDA

Aug, 30

Swendsen-Wang Multi-Cluster Algorithm for the 2D/3D Ising Model on Xeon Phi and GPU

Simulations of the critical Ising model by means of local update algorithms suffer from critical slowing down. One way to partially compensate for the influence of this phenomenon on the runtime of simulations is using increasingly faster and parallel computer hardware. Another approach is using algorithms that do not suffer from critical slowing down, such […]

CUDA

Aug, 30

8th International Symposium on Intelligent Distributed Computing, IDC’2014

The emergent field of Intelligent Distributed Computing focuses on the development of a new generation of intelligent distributed systems. It faces the challenges of adapting and combining research in the fields of Intelligent Computing and Distributed Computing. Intelligent Computing develops methods and technology ranging from classical artificial intelligence, computational intelligence and multi-agent systems to game […]

Aug, 30

Fifth International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies, HEART 2014

The HEART symposium is an international forum on state-of-the-art research in high-performance and power-efficient computing using accelerator technologies such as FPGAs, GPGPUs, and/or specialized accelerators. The scope of the meeting includes, but is not limited to: Architectures and systems: Novel systems/platforms for efficient acceleration based on FPGA, GPU, and other devices Heterogeneous processor architectures and […]

Aug, 30

24th International Conference on Field Programmable Logic and Applications, FPL 2014

The International Conference on Field Programmable Logic and Applications (FPL) is the first and largest conference covering the rapidly growing area of field-programmable logic. During the past 23 years, many of the advances achieved in reconfigurable system architectures, applications, embedded processors, design automation methods (EDA) and tools have been first published in the proceedings of […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Bitcoin and The Age of Bespoke Silicon

Particle Swarm Optimization of Model Parameters: Simulation of Deep Reactive Ion Etching by the Continuous Cellular Automaton

Computing High Resolution Explicit Corridor Maps using Parallel Technologies

Accelerating Text Mining Workloads in a MapReduce-based Distributed GPU Environment

A Feedback Approach to Task Partitioning in Heterogeneous Architectures

Real-Time GPU Path Tracing

Evolutionary Algorithm for Optimizing Parameters of GPGPU-based Image Segmentation

Performance Portability Strategies for Computational Fluid Dynamics (CFD) Applications on HPC Systems

Swendsen-Wang Multi-Cluster Algorithm for the 2D/3D Ising Model on Xeon Phi and GPU

8th International Symposium on Intelligent Distributed Computing, IDC’2014

Fifth International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies, HEART 2014

24th International Conference on Field Programmable Logic and Applications, FPL 2014

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)