high performance computing on graphics processing units: hgpu.org

Posts

May, 15

A Monte Carlo Neutron Transport Code for Eigenvalue Calculations on a Dual-GPU System and CUDA Environment

Monte Carlo (MC) method is able to accurately calculate eigenvalues in reactor analysis. Its lengthy computation time can be reduced by general-purpose computing on Graphics Processing Units (GPU), one of the latest parallel computing techniques under development. The method of porting a regular transport code to GPU is usually very straightforward due to the "embarrassingly […]

CUDA

May, 15

An Automatic Speech Recognition Application Framework for Highly Parallel Implementations on the GPU

Data layout, data placement, and synchronization processes are not usually part of a speech application expert’s daily concerns. Yet failure to carefully take these concerns into account in a highly parallel implementation on the graphics processing units (GPUs) could mean an order of magnitude of loss in application performance. In this paper we present an […]

CUDA

May, 15

Real-time Traffic Sign Recognition with Map Fusion on Multicore/Many-core Architectures

This paper presents a parallel implementation and performance analysis of a system for traffic sign recognition with digital map fusion on emerging multicore processors and graphics processing units (GPU). The system employs a particle filter based localization and map matching and template-based matching for sign recognition. In the proposed system, a GPS, odometer and camera […]

CUDA

May, 14

Parallel Approach for Time Series Analysis with General Regression Neural Networks

The accuracy on time delay estimation given pairs of irregularly sampled time series is of great relevance in astrophysics. However the computational time is also important because the study of large data sets is needed. Besides introducing a new approach for time delay estimation, this paper presents a parallel approach to obtain a fast algorithm […]

CUDA

May, 14

Mapping a Data-Flow Programming Model onto Heterogeneous Platforms

In this paper we explore mapping of a high-level macro data-flow programming model called Concurrent Collections (CnC) onto heterogeneous platforms in order to achieve high performance and low energy consumption while preserving the ease of use of data-flow programming. Modern computing platforms are becoming increasingly heterogeneous in order to improve energy efficiency. This trend is […]

CUDA

May, 14

Heterogeneous Computing in Economics: a Simplified Approach

This paper shows the potential of heterogeneous computing in solving dynamic equilibrium models in economics. We illustrate the power and simplicity of the C++ Accelerated Massive Parallelism recently introduced by Microsoft. Starting from the same exercise as Aldrich et al. (2011) we document a speed gain together with a simplified programming style that naturally enables […]

CUDA

May, 12

Scalable Distributed Fast Multipole Methods

The Fast Multipole Method (FMM) allows O(N) evaluation to any arbitrary precision of N-body interactions that arises in many scientific contexts. These methods have been parallelized, with a recent set of papers attempting to parallelize them on heterogeneous CPU/GPU architectures [1]. While impressive performance was reported, the algorithms did not demonstrate complete weak or strong […]

CUDA

May, 12

Parallel Cryptanalysis

Most of today’s cryptographic primitives are based on computations that are hard to perform for a potential attacker but easy to perform for somebody who is in possession of some secret information, the key, that opens a back door in these hard computations and allows them to be solved in a small amount of time. […]

CUDA

May, 12

Multi-dimensional characterization of electrostatic surface potential computation on graphics processors

BACKGROUND: Calculating the electrostatic surface potential (ESP) of a biomolecule is critical towards understanding biomolecular function. Because of its quadratic computational complexity (as a function of the number of atoms in a molecule), there have been continual efforts to reduce its complexity either by improving the algorithm or the underlying hardware on which the calculations […]

CUDA

May, 12

Characterization and Transformation of Unstructured Control Flow in Bulk Synchronous GPU Applications

In this paper we identify important classes of program control flows in applications targeted to commercially available graphics processing units (GPUs) and characterize their presence in real workloads such as those that occur in CUDA and OpenCL. Broadly, control flow can be characterized as structured or unstructured. It is shown that most existing techniques for […]

CUDA

•

OpenCL

May, 12

Enhancing GPU Parallelism in Nature-Inspired Algorithms

We present GPU implementations of two different nature-inspired optimization methods for well-known optimization problems. Ant Colony Optimization (ACO) is a two-stage population-based method modelled on the foraging behaviour of ants, while P systems provide a high-level computational modelling framework that combines the structure and dynamic aspects of biological systems (in particular, their parallel and non-deterministic […]

CUDA

May, 11

Efficient Parallelization of Natural Language Applications using GPUs

As we enter the era of mobile computing, high-quality and efficient natural language applications become more and more important, which greatly facilitate intelligent human-computer interaction. Unfortunately, most high-quality natural language applications employ large statistical models, which render them impractical for real-time use. Meanwhile, Graphics Processor Units (GPUs) have become widely available, offering the opportunity to […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

A Monte Carlo Neutron Transport Code for Eigenvalue Calculations on a Dual-GPU System and CUDA Environment

An Automatic Speech Recognition Application Framework for Highly Parallel Implementations on the GPU

Real-time Traffic Sign Recognition with Map Fusion on Multicore/Many-core Architectures

Parallel Approach for Time Series Analysis with General Regression Neural Networks

Mapping a Data-Flow Programming Model onto Heterogeneous Platforms

Heterogeneous Computing in Economics: a Simplified Approach

Scalable Distributed Fast Multipole Methods

Parallel Cryptanalysis

Multi-dimensional characterization of electrostatic surface potential computation on graphics processors

Characterization and Transformation of Unstructured Control Flow in Bulk Synchronous GPU Applications

Enhancing GPU Parallelism in Nature-Inspired Algorithms

Efficient Parallelization of Natural Language Applications using GPUs

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)