high performance computing on graphics processing units: hgpu.org

Posts

Mar, 23

Dense linear algebra solvers for multicore with GPU accelerators

Solving dense linear systems of equations is a fundamental problem in scientific computing. Numerical sim- ulations involving complex systems represented in terms of unknown variables and relations between them often lead to linear systems of equations that must be solved as fast as possible. We describe current efforts toward the development of these critical solvers […]

Mar, 23

GPU Accelerators for Evolvable Cellular Automata

In order to design cellular automata rules by means of evolutionary algorithms, high computational demands need to be met. This problem may be partially solved by parallelization. Since parallel supercomputers and server clusters are expensive and often overburdened, this paper proposes the evolution of cellular automata rules on small and inexpensive graphic processing units. The […]

CUDA

Mar, 23

Data Visualization and Mining using the GPU

An exciting development in the computing industry has been the emergence of graphics processing units (the GPU) as a fast general purpose co-processor. Initially designed for gaming applications, todays GPUs demonstrate impressive computing power and high levels of parallelism and are now being used for a variety of applications far removed from traditional graphics rendering […]

Mar, 23

On Using GPU to Compute Options and Derivatives

Algorithmic Trading has created an increasing demand for high performance computing solutions within financial organizations. The actors of portfolio management and risk assessment have the obligation to increase their computing resources in order to provide competitive models for financial management and pricing financial instruments. GPU Stands for “Graphic Processing Unit”. GPU processing (or Stream Processing) […]

CUDA

Mar, 23

Efficient stream reduction on the GPU

Stream reduction is the process of removing unwanted elements from a stream of outputs. It is a key component of many GPGPU algorithms, especially in multi-pass algorithms: the stream reduction is used to remove unwanted elements from the output of a previous pass before sending it as input for the next pass. In this paper, […]

OpenGL

Mar, 22

A GPGPU solution of the FMM near interactions for acoustic scattering problems

The Fast Multipole Method (FMM) is specially suitable for applications in which it is necessary to predict the acoustic scattering, e.g., aircraft noise control. This accelerated iterative method has two main parts, far interactions and near interactions. Near interactions are computationally intensive and they fit properly in the Single Instruction Multiple Threads paradigm. In this […]

Mar, 22

Fast and accurate PIV computation using highly parallel iterative correlation maximization

Our contribution deals with fast computation of dense two-component (2C) PIV vector fields using Graphics Processing Units (GPUs). We show that iterative gradient-based cross-correlation optimization is an accurate and efficient alternative to multi-pass processing with FFT-based cross-correlation. Density is meant here from the sampling point of view (we obtain one vector per pixel), since the […]

CUDA

Mar, 22

Bridging the GPGPU-FPGA efficiency gap

This paper compares an implementation of a Bayesian inference algorithm across several FPGAs and GPGPUs, while embracing both the execution model and high-level architecture of a GPGPU. Our study is motivated by recent work in template-based programming and architectural models for FPGA computing. The comparison we present is meant to demonstrate the FPGA’s potential, while […]

OpenCL

Mar, 22

Improving accuracy for matrix multiplications on GPUs

Reproducibility of an experiment is a commonly used metric to determine its validity. Within scientific computing, this can become difficult due to the accumulation of floating point rounding errors in the numerical computation, greatly reducing the accuracy of the computation. Matrix multiplication is particularly susceptible to these rounding errors which is why there exist so […]

CUDA

Mar, 22

Evaluating force field accuracy with long-time simulations of a beta-hairpin tryptophan zipper peptide

We have combined graphics processing unit-accelerated all-atom molecular dynamics with parallel tempering to explore the folding properties of small peptides in implicit solvent on the time scale of microseconds. We applied this methodology to the synthetic beta-hairpin, trpzip2, and one of its sequence variants, W2W9. Each simulation consisted of over 8 ms of aggregated virtual […]

CUDA

Mar, 22

Constructing Two-Dimensional Voronoi Diagrams via Divide-and-Conquer of Envelopes in Space (thesis)

We present a general framework for computing two-dimensional Voronoi diagrams of different classes of sites under various distance functions. The framework is sufficiently general to support diagrams embedded on a family of two-dimensional parametric surfaces in $R^3$. The computation of the diagrams is carried out through the construction of envelopes of surfaces in 3-space provided […]

Mar, 22

Constructing Two-Dimensional Voronoi Diagrams via Divide-and-Conquer of Envelopes in Space

We present a general framework for computing Voronoi diagrams of different classes of sites under various distance functions in $R^3$. Most diagrams mentioned in the paper are in the plane. However, the framework is sufficiently general to support diagrams embedded on a family of two-dimensional parametric surfaces in three-dimensions. The computation of the diagrams is […]

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Dense linear algebra solvers for multicore with GPU accelerators

GPU Accelerators for Evolvable Cellular Automata

Data Visualization and Mining using the GPU

On Using GPU to Compute Options and Derivatives

Efficient stream reduction on the GPU

A GPGPU solution of the FMM near interactions for acoustic scattering problems

Fast and accurate PIV computation using highly parallel iterative correlation maximization

Bridging the GPGPU-FPGA efficiency gap

Improving accuracy for matrix multiplications on GPUs

Evaluating force field accuracy with long-time simulations of a beta-hairpin tryptophan zipper peptide

Constructing Two-Dimensional Voronoi Diagrams via Divide-and-Conquer of Envelopes in Space (thesis)

Constructing Two-Dimensional Voronoi Diagrams via Divide-and-Conquer of Envelopes in Space

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)