13723

Posts

Mar, 12

RadixBoost: A Hardware Acceleration Structure for Scalable Radix Sort on Graphic Processors

In this paper, we propose RadixBoost, a hardware acceleration structure for scalable 32-bit integer radix sort on GPU. The whole structure is integrated into a GPU microarchitecture as a special functional unit and can be started by new instructions. Our design enables a significantly faster sorting procedure for general purpose GPU computing. The RadixBoost architecture […]
Mar, 12

FastTree: A Hardware KD-Tree Construction Acceleration Engine for Real-Time Ray Tracing

The ray tracing algorithm is well-known for its ability to generate photo-realistic rendering effects. Recent years have witnessed a renewed momentum in pushing it to real-time for better user experience. Today the construction of acceleration structures, e.g., kd-tree, has become the bottleneck of ray tracing. A dedicated hardware architecture, FastTree, was proposed for kd-tree construction […]
Mar, 8

HOCL: A Family of Embedded Languages

We address the increasingly varied capabilities of specialized computing platforms by introducing a growing family of functionally-limited mini-languages, implemented as embedded domain specific languages (EDSLs) in Haskell, that may be composed to harness the computational features offered by a variety of hardware platforms. This development is based on a novel modular representation of the EDSL […]
Mar, 8

Converting Data-Parallelism to Task-Parallelism by Rewrites: Purely Functional Programs Across Multiple GPUs

High-level domain-specific languages for array processing on the GPU are increasingly common, but they typically only run on a single GPU. As computational power is distributed across more devices, languages must target multiple devices simultaneously. To this end, we present a compositional translation that fissions data-parallel programs in the Accelerate language, allowing subsequent compiler and […]
Mar, 8

An Empirical Performance Evaluation of GPU-Enabled Graph-Processing Systems

Graph processing is increasingly used in knowledge economies and in science, in advanced marketing, social networking, bioinformatics, etc. A number of graph-processing systems, including the GPU-enabled Medusa and Totem, have been developed recently. Understanding their performance is key to system selection, tuning, and improvement. Previous performance evaluation studies have been conducted for CPU-based graph-processing systems, […]
Mar, 8

HPerf: A Lightweight Profiler for Task Distribution on CPU+GPU Platforms

Heterogeneous computing has emerged as one of the major computing platforms in many domains. Although there have been several proposals to aid programming for heterogeneous computing platforms, optimizing applications on heterogeneous computing platforms is not an easy task. Identifying which parallel regions (or tasks) should run on GPUs or CPUs is one of the critical […]
Mar, 8

Generating Performance Portable Code using Rewrite Rules: From High-level Functional Expressions to High-Performance OpenCL Code

Computing systems have become increasingly complex with the emergence of heterogeneous hardware combining multicore CPUs and GPUs. These parallel systems exhibit tremendous computational power at the cost of increased programming effort. This results in a tension between performance and code portability. Typically, code is either tuned in an low-level imperative language using hardware-specific optimizations to […]
Mar, 6

Lyra2: Password Hashing Scheme with improved security against time-memory trade-offs

We present Lyra2, a password hashing scheme (PHS) based on cryptographic sponges. Lyra2 was designed to be strictly sequential (i.e., not easily parallelizable), providing strong security even against attackers that uses multiple processing cores (e.g., custom hardware or a powerful GPU). At the same time, it is very simple to implement in software and allows […]
Mar, 6

High-Performance Computation of a Jet in Cross Flow by Lattice Boltzmann Based Parallel Direct Numerical Simulation

Direct numerical simulation (DNS) of a round jet in crossflow based on lattice-Boltzmann method (LBM) is carried out on multi-GPU cluster. Data-parallel SIMT (Single- Instruction Multiple-Thread) characteristic of GPU matches the parallelism of LBM well, which leads to the high efficiency of GPU on the LBM solver. With present GPU settings (6 Nvidia Telsa K20M), […]
Mar, 6

PONDER – A Real time software backend for pulsar and IPS observations at the Ooty Radio Telescope

This paper describes a new real-time versatile backend, the Pulsar Ooty Radio Telescope New Digital Efficient Receiver (PONDER), which has been designed to operate along with the legacy analog system of the Ooty Radio Telescope (ORT). PONDER makes use of the current state of the art computing hardware, a Graphical Processing Unit (GPU) and sufficiently […]
Mar, 6

Multi-GPU implementation of a VMAT treatment plan optimization algorithm

VMAT optimization is a computationally challenging problem due to its large data size, high degrees of freedom, and many hardware constraints. High-performance graphics processing units have been used to speed up the computations. However, its small memory size cannot handle cases with a large dose-deposition coefficient (DDC) matrix. This paper is to report an implementation […]
Mar, 6

An OpenCL-based Monte Carlo dose calculation engine (oclMC) for coupled photon-electron transport

Monte Carlo (MC) method has been recognized the most accurate dose calculation method for radiotherapy. However, its extremely long computation time impedes clinical applications. Recently, a lot of efforts have been made to realize fast MC dose calculation on GPUs. Nonetheless, most of the GPU-based MC dose engines were developed in NVidia CUDA environment. This […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: