high performance computing on graphics processing units: hgpu.org

Posts

Nov, 27

A GPU-based Simulation for Stochastic Computing

Stochastic computing performs operations using streams of bits that represent probability values instead of deterministic values. An important benefit of stochastic computing is that it can tolerate a large number of failures in a noisy system. Additionally, for the VLSI implementation of a sophisticated algorithm, a stochastic implementation can consume much less hardware with lower […]

CUDA

Nov, 27

Scalable Multi-Cache Simulation Using GPUs

Software simulation is the primary tool used for evaluation of processor design. Simulation offers better accuracy than analytical models and is an important evaluation step before actually fabricating a chip. Unfortunately, simulator speeds are slow — a conventional cycle-accurate simulator will be unable to keep up with increasing core counts in modern processor design. Parallel […]

CUDA

Nov, 27

Towards paradisEO-MO-GPU: a framework for GPU-based local search metaheuristics

This paper is a major step towards a pioneering software framework for the reusable design and implementation of parallel metaheuristics on Graphics Processing Units (GPU). The objective is to revisit the ParadisEO framework to allow its utilization on GPU accelerators. The focus is on local search metaheuristics and the parallel exploration of their neighborhood. The […]

CUDA

Nov, 27

Overlapping Computation and Communication for Advection on Hybrid Parallel Computers

We describe computational experiments exploring the performance improvements from overlapping computation and communication on hybrid parallel computers. Our test case is explicit time integration of linear advection with constant uniform velocity in a three-dimensional periodic domain. The test systems include a Cray XT5, a Cray XE6, and two multicore Infiniband clusters with different generations of […]

CUDA

Nov, 27

Numerical Precision and Benchmarking Very-High-Order Integration of Particle Dynamics on GPU Accelerators

GPUs offer a powerful acceleration platform for many scientific applications. Numerical integration of classical Newtonian dynamical particles often requires very high-order numerical accuracy. We assess the floating-point precision and performance of various GPUs for applications involving high-order time-step integration methods for particle model simulations using N-squared interactions. We demonstrate how high-order algorithms can be expressed […]

CUDA

Nov, 27

GPU Implementation of Spiking Neural Networks for Color Image Segmentation

Spiking neural networks (SNN) are powerful computational model inspired by the human neural system for engineers and neuroscientists to simulate intelligent computation of the brain. Inspired by the visual system, various spiking neural network models have been used to process visual images. However, it is time-consuming to simulate a large scale of spiking neurons in […]

CUDA

Nov, 27

A low-cost 3D human interface device using GPU-based optical flow algorithms

Except for a few cases, nowadays it is very common to find a camera embedded in a consumer grade laptop, notebook, mobile internet device (MID), mobile phone or handheld game console. Some of them also have a Graphic Processing Unit (GPU) to handle 3D graphics and other related tasks. This trend will probably continue in […]

Nov, 27

A GPU based Parallel Hierarchical Fuzzy ART Clustering

Hierarchical clustering is an important and powerful but computationally extensive operation. Its complexity motivates the exploration of highly parallel approaches such as Adaptive Resonance Theory (ART). Although ART has been implemented on GPU processors, this paper presents the first hierarchical ART GPU implementation we are aware of. Each ART layer is distributed in the GPU’s […]

CUDA

Nov, 26

A New Compilation Path: From Python/NumPy to OpenCL

Jit4OpenCL is a new compiler that converts scientific applications written in Python/NumPy into OpenCL code. This compiler is based on unPython, an ahead-of-time compiler from Python/Numpy to an intermediate form and OpenMP code, and on jit4GPU, a just-in-time compiler that converts that intermediate code into AMD CAL code that is specific for AMD GPUs. The […]

OpenCL

Nov, 26

Symbolic Differentiation in GPU Shaders

Derivatives arise frequently in graphics and scientific computation applications. As GPU’s become more widely used for scientific computation the need for derivatives can be expected to increase. To meet this need we have added symbolic differentiation as a built in language feature in the HLSL shading language. The symbolic derivative is computed at compile time […]

Nov, 26

Iterative Solution of Linear Systems in Electromagnetics (and not only): Experiences with CUDA

In this paper, we propose the use of graphics processing units as a low-cost and efficient solution of electromagnetic (and other) numerical problems. Based on the software platform CUDA (Compute Unified Device Architecture), a solver for unstructured sparse matrices with double precision complex data has been implemented and tested for several practical cases. Benchmark results […]

CUDA

Nov, 26

A framework for network traffic analysis using GPUs

During the last years the computer networks have become an important part of our society. Networks have kept growing in size and complexity, making more complex its management and traffic monitoring and analysis processes, due to the huge amount of data and calculations involved. In the last decade, several researchers found effective to use graphics […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

A GPU-based Simulation for Stochastic Computing

Scalable Multi-Cache Simulation Using GPUs

Towards paradisEO-MO-GPU: a framework for GPU-based local search metaheuristics

Overlapping Computation and Communication for Advection on Hybrid Parallel Computers

Numerical Precision and Benchmarking Very-High-Order Integration of Particle Dynamics on GPU Accelerators

GPU Implementation of Spiking Neural Networks for Color Image Segmentation

A low-cost 3D human interface device using GPU-based optical flow algorithms

A GPU based Parallel Hierarchical Fuzzy ART Clustering

A New Compilation Path: From Python/NumPy to OpenCL

Symbolic Differentiation in GPU Shaders

Iterative Solution of Linear Systems in Electromagnetics (and not only): Experiences with CUDA

A framework for network traffic analysis using GPUs

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)