high performance computing on graphics processing units: hgpu.org

Posts

May, 15

Fast Adaptive Sampling Technique for Multi-Dimensional Integral Estimation Using GPUs

Evaluating multi-dimensional integrals is a commonly encountered problem in many areas of science including Physics and Volume estimation of convex bodies. One of the widely used techniques for integral evaluation in large dimensions is the Monte Carlo method. Vanilla Monte Carlo methods of Integral Estimation use uniform sampling techniques. Variance of such uniform sampling reduces […]

CUDA

May, 15

Parallel implementation of a ray tracer for underwater sound waves using the cuda libraries: description and application to the simulation of underwater networks

One of the most time-consuming parts of the simulation of underwater networks is the realistic simulation of underwater sound propagation. Some well-known software tools used for networks simulations to date employ ray tracing to simulate sound propagation. This gives rise to high computational complexity, and may require very long time to complete a simulation. In […]

CUDA

May, 15

A Monte Carlo Neutron Transport Code for Eigenvalue Calculations on a Dual-GPU System and CUDA Environment

Monte Carlo (MC) method is able to accurately calculate eigenvalues in reactor analysis. Its lengthy computation time can be reduced by general-purpose computing on Graphics Processing Units (GPU), one of the latest parallel computing techniques under development. The method of porting a regular transport code to GPU is usually very straightforward due to the "embarrassingly […]

CUDA

May, 15

An Automatic Speech Recognition Application Framework for Highly Parallel Implementations on the GPU

Data layout, data placement, and synchronization processes are not usually part of a speech application expert’s daily concerns. Yet failure to carefully take these concerns into account in a highly parallel implementation on the graphics processing units (GPUs) could mean an order of magnitude of loss in application performance. In this paper we present an […]

CUDA

May, 15

Real-time Traffic Sign Recognition with Map Fusion on Multicore/Many-core Architectures

This paper presents a parallel implementation and performance analysis of a system for traffic sign recognition with digital map fusion on emerging multicore processors and graphics processing units (GPU). The system employs a particle filter based localization and map matching and template-based matching for sign recognition. In the proposed system, a GPS, odometer and camera […]

CUDA

May, 14

Parallel Approach for Time Series Analysis with General Regression Neural Networks

The accuracy on time delay estimation given pairs of irregularly sampled time series is of great relevance in astrophysics. However the computational time is also important because the study of large data sets is needed. Besides introducing a new approach for time delay estimation, this paper presents a parallel approach to obtain a fast algorithm […]

CUDA

May, 14

Mapping a Data-Flow Programming Model onto Heterogeneous Platforms

In this paper we explore mapping of a high-level macro data-flow programming model called Concurrent Collections (CnC) onto heterogeneous platforms in order to achieve high performance and low energy consumption while preserving the ease of use of data-flow programming. Modern computing platforms are becoming increasingly heterogeneous in order to improve energy efficiency. This trend is […]

CUDA

May, 14

Heterogeneous Computing in Economics: a Simplified Approach

This paper shows the potential of heterogeneous computing in solving dynamic equilibrium models in economics. We illustrate the power and simplicity of the C++ Accelerated Massive Parallelism recently introduced by Microsoft. Starting from the same exercise as Aldrich et al. (2011) we document a speed gain together with a simplified programming style that naturally enables […]

CUDA

May, 12

Scalable Distributed Fast Multipole Methods

The Fast Multipole Method (FMM) allows O(N) evaluation to any arbitrary precision of N-body interactions that arises in many scientific contexts. These methods have been parallelized, with a recent set of papers attempting to parallelize them on heterogeneous CPU/GPU architectures [1]. While impressive performance was reported, the algorithms did not demonstrate complete weak or strong […]

CUDA

May, 12

Parallel Cryptanalysis

Most of today’s cryptographic primitives are based on computations that are hard to perform for a potential attacker but easy to perform for somebody who is in possession of some secret information, the key, that opens a back door in these hard computations and allows them to be solved in a small amount of time. […]

CUDA

May, 12

Multi-dimensional characterization of electrostatic surface potential computation on graphics processors

BACKGROUND: Calculating the electrostatic surface potential (ESP) of a biomolecule is critical towards understanding biomolecular function. Because of its quadratic computational complexity (as a function of the number of atoms in a molecule), there have been continual efforts to reduce its complexity either by improving the algorithm or the underlying hardware on which the calculations […]

CUDA

May, 12

Characterization and Transformation of Unstructured Control Flow in Bulk Synchronous GPU Applications

In this paper we identify important classes of program control flows in applications targeted to commercially available graphics processing units (GPUs) and characterize their presence in real workloads such as those that occur in CUDA and OpenCL. Broadly, control flow can be characterized as structured or unstructured. It is shown that most existing techniques for […]

CUDA

•

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Posts

Fast Adaptive Sampling Technique for Multi-Dimensional Integral Estimation Using GPUs

Parallel implementation of a ray tracer for underwater sound waves using the cuda libraries: description and application to the simulation of underwater networks

A Monte Carlo Neutron Transport Code for Eigenvalue Calculations on a Dual-GPU System and CUDA Environment

An Automatic Speech Recognition Application Framework for Highly Parallel Implementations on the GPU

Real-time Traffic Sign Recognition with Map Fusion on Multicore/Many-core Architectures

Parallel Approach for Time Series Analysis with General Regression Neural Networks

Mapping a Data-Flow Programming Model onto Heterogeneous Platforms

Heterogeneous Computing in Economics: a Simplified Approach

Scalable Distributed Fast Multipole Methods

Parallel Cryptanalysis

Multi-dimensional characterization of electrostatic surface potential computation on graphics processors

Characterization and Transformation of Unstructured Control Flow in Bulk Synchronous GPU Applications

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)