13159

Posts

Nov, 18

Processing Hard Sphere Collisions on a GPU Using OpenCL

Physically accurate hard sphere collisions are inherently sequential as the order in which collisions occur can have a significant impact on the resulting system. This makes processing hard sphere collisions on parallel hardware challenging. We present an approach to solving this problem that can be implemented using OpenCL that runs on current hardware. This approach […]
Nov, 18

Parallel Neutrino Triggers using GPUs for an underwater telescope

Graphics Processing Units are high performance co-processors originally intended to improve the use and the acceleration of computer graphics applications. Because of their performance, researchers have extended their use beyond the computer graphics scope. We have investigate the possibility of implementing and speeding up online neutrino trigger algorithms in the KM3Net-It experiment using a CPU-GPU […]
Nov, 18

Glider: A GPU Library Driver for Improved System Security

Legacy device drivers implement both device resource management and isolation. This results in a large code base with a wide high-level interface making the driver vulnerable to security attacks. This is particularly problematic for increasingly popular accelerators like GPUs that have large, complex drivers. We solve this problem with library drivers, a new driver architecture. […]
Nov, 18

A Survey Of Techniques for Managing and Leveraging Caches in GPUs

Initially introduced as special-purpose accelerators for graphics applications, graphics processing units (GPUs) have now emerged as general purpose computing platforms for a wide range of applications. To address the requirements of these applications, modern GPUs include sizable hardware-managed caches. However, several factors, such as unique architecture of GPU, rise of CPU–GPU heterogeneous computing, etc., demand […]
Nov, 16

Mobile GPGPU Acceleration of Embodied Robot Simulation

It is desirable for a robot to be able to run on-board simulations of itself in a model of the world to evaluate action consequences and test new controller solutions, but simulation is computationally expensive. Modern mobile System-on-Chip devices have high performance at low power consumption levels and now incorporate powerful graphics processing units, making […]
Nov, 16

Ray Reordering Techniques for GPU Ray-Cast Ambient Occlusion

Global illumination techniques, such as ambient occlusion, can be performed in a physically accurate way via ray casting. However ambient occlusion rays are incoherent. This means their computation is divergent causing a degradation of rendering performance. This problem is particularly acute on the GPU stream computing architectures which have performance issues with thread divergence. We […]
Nov, 16

The Q Continuum Simulation: Harnessing the Power of GPU Accelerated Supercomputers

Modeling large-scale sky survey observations is a key driver for the continuing development of high resolution, large-volume, cosmological simulations. We report the first results from the ‘Q Continuum’ cosmological N-body simulation run carried out on the GPU-accelerated supercomputer Titan. The simulation encompasses a volume of (1300 Mpc)^3 and evolves more than half a trillion particles, […]
Nov, 16

The Implementation of a Real-Time Polyphase Filter

In this article we study the suitability of different computational accelerators for the task of real-time data processing. The algorithm used for comparison is the polyphase filter, a standard tool in signal processing and a well established algorithm. We measure performance in FLOPs and execution time, which is a critical factor for real-time systems. For […]
Nov, 16

CUDArray: CUDA-based NumPy

This technical report introduces CUDArray – a CUDA-accelerated subset of the NumPy library. The goal of CUDArray is to combine the ease of development from NumPy with the computational power of Nvidia GPUs in a lightweight and extensible framework. Since the motivation behind CUDArray is to facilitate neural network programming, CUDArray extends NumPy with a […]
Nov, 13

Whippletree: Task-based Scheduling of Dynamic Workloads on the GPU

In this paper, we present Whippletree, a novel approach to scheduling dynamic, irregular workloads on the GPU. We introduce a new programming model which offers the simplicity and expressiveness of task-based parallelism while retaining all aspects of the multilevel execution hierarchy essential to unlocking the full potential of a modern GPU. At the same time, […]
Nov, 13

Mobile GPU Computing Based Filter Bank Convolution for Three-dimensional Wavelet Transform

Mobile GPU computing, or System on Chip with embedded GPU (SoC GPU), becomes in great demand recently. Since these SoCs are designed for mobile devices with real-time applications such as image processing and video processing, high-efficient implementations of wavelet transform are essential for these chips. In this paper, we develop two SoC GPU based DWT: […]
Nov, 13

High-accuracy Optimization by Parallel Iterative Discrete Approximation and Multi-GPU Computing

High-accuracy optimizer is the essential part of accuracy-sensitive applications such as computational finance and computational biology, and we developed single-GPU based Iterative Discrete Approximation Monte Carlo Search (IDA-MCS) in our previous research. However, single-GPU IDA-MCS is in low performance or even functionless for optimization problems with large number of peaks because of the capability constrains […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: