13139

Posts

Nov, 16

The Q Continuum Simulation: Harnessing the Power of GPU Accelerated Supercomputers

Modeling large-scale sky survey observations is a key driver for the continuing development of high resolution, large-volume, cosmological simulations. We report the first results from the ‘Q Continuum’ cosmological N-body simulation run carried out on the GPU-accelerated supercomputer Titan. The simulation encompasses a volume of (1300 Mpc)^3 and evolves more than half a trillion particles, […]
Nov, 16

The Implementation of a Real-Time Polyphase Filter

In this article we study the suitability of different computational accelerators for the task of real-time data processing. The algorithm used for comparison is the polyphase filter, a standard tool in signal processing and a well established algorithm. We measure performance in FLOPs and execution time, which is a critical factor for real-time systems. For […]
Nov, 16

CUDArray: CUDA-based NumPy

This technical report introduces CUDArray – a CUDA-accelerated subset of the NumPy library. The goal of CUDArray is to combine the ease of development from NumPy with the computational power of Nvidia GPUs in a lightweight and extensible framework. Since the motivation behind CUDArray is to facilitate neural network programming, CUDArray extends NumPy with a […]
Nov, 13

Whippletree: Task-based Scheduling of Dynamic Workloads on the GPU

In this paper, we present Whippletree, a novel approach to scheduling dynamic, irregular workloads on the GPU. We introduce a new programming model which offers the simplicity and expressiveness of task-based parallelism while retaining all aspects of the multilevel execution hierarchy essential to unlocking the full potential of a modern GPU. At the same time, […]
Nov, 13

Mobile GPU Computing Based Filter Bank Convolution for Three-dimensional Wavelet Transform

Mobile GPU computing, or System on Chip with embedded GPU (SoC GPU), becomes in great demand recently. Since these SoCs are designed for mobile devices with real-time applications such as image processing and video processing, high-efficient implementations of wavelet transform are essential for these chips. In this paper, we develop two SoC GPU based DWT: […]
Nov, 13

High-accuracy Optimization by Parallel Iterative Discrete Approximation and Multi-GPU Computing

High-accuracy optimizer is the essential part of accuracy-sensitive applications such as computational finance and computational biology, and we developed single-GPU based Iterative Discrete Approximation Monte Carlo Search (IDA-MCS) in our previous research. However, single-GPU IDA-MCS is in low performance or even functionless for optimization problems with large number of peaks because of the capability constrains […]
Nov, 13

Semi-Analytic Solutions to the Radiative Transfer Equations via Hetergeneous Computing

High energy density radiative transfer benchmark solutions are presented for a 1-D slab geometry using a three-temperature (electron, ion, and radiation) model and 1-D spherical geometry using a two-temperature (material, radiation) model. A transport model is used for the radiation, a conduction model is used for the electrons, and ion and/or material motion is assumed […]
Nov, 13

Manycore processing of repeated range queries over massive moving objects observations

The ability to timely process significant amounts of continuously updated spatial data is mandatory for an increasing number of applications. Parallelism enables such applications to face this data-intensive challenge and allows the devised systems to feature low latency and high scalability. In this paper we focus on a specific data-intensive problem, concerning the repeated processing […]
Nov, 12

Brute force de-shredding algorithm using the GPU

The graphics processing unit (GPU) has seen significant increase in performance over the past few years. Hence the interest in using GPUs for more general purposes has increased. The higher number of cores on a GPU allows it to outperform central processing units (CPUs). However, since in certain aspects instructions executed on the GPU must […]
Nov, 12

Locality-Aware Mapping of Nested Parallel Patterns on GPUs

Recent work has explored using higher level languages to improve programmer productivity on GPUs. These languages often utilize high level computation patterns (e.g., Map and Reduce) that encode parallel semantics to enable automatic compilation to GPU kernels. However, the problem of efficiently mapping patterns to GPU hardware becomes significantly more difficult when the patterns are […]
Nov, 12

Accelerated Runtime Verification of LTL Specifications with Counting Semantics

Runtime verification is an effective automated method for specification-based offline testing and analysis as well as online monitoring of complex systems. The specification language is often a variant of regular expressions or a popular temporal logic, such as LTL. This paper presents a novel and efficient parallel algorithm for verifying a more expressive version of […]
Nov, 12

Grace: a Cross-platform Micromagnetic Simulator On Graphics Processing Units

A micromagnetic simulator running on graphics processing unit (GPU) is presented. It achieves significant performance boost as compared to previous central processing unit (CPU) simulators, up to two orders of magnitude for large input problems. Different from GPU implementations of other research groups, this simulator is developed with C++ Accelerated Massive Parallelism (C++ AMP) and […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: