high performance computing on graphics processing units: hgpu.org

Posts

Sep, 23

Parallel processing on NVIDIA graphics processing units using CUDA

This paper is an introduction to general-purpose computing on graphics processing units. This involves taking advantage of the parallel processing power of modern graphics cards to do general purpose computation. The CUDA architecture used for general purpose computations on NVIDIA graphics cards is described, and important features affecting the run times of CUDA programs are […]

CUDA

Sep, 23

Functional and dynamic programming in the design of parallel prefix networks

A parallel prefix network of width n takes n inputs, a_1, a_2, … , a_n, and computes each yi = a_1 o a_2 o … o a_i for 1 <= i <= n, for an associative operator o. This is one of the fundamental problems in computer science, because it gives insight into how parallel […]

Sep, 23

Image super-resolution by vectorizing edges

As the resolution of output device increases, the demand of high resolution contents has become more eagerly. Therefore, the image superresolution algorithms become more important. In digital image, the edges in the image are related to human perception heavily. Because of this, most recent research topics tend to enhance the image edges to achieve better […]

Sep, 23

Acceleration of Functional Validation Using GPGPU

Logic simulation of a VLSI chip is a computationally intensive process. There exists an urgent need to map functional validation algorithms onto parallel architectures to aid hardware designers in meeting time-to-market constraints. In this paper, we propose three novel methods for logic simulation of combinational circuits on GPGPUs. Initial experiments run on two methods using […]

Sep, 23

Simple optimizations for an applicative array language for graphics processors

Graphics processors (GPUs) are highly parallel devices that promise high performance, and they are now flexible enough to be used for general-purpose computing. A programming language based on implicitly data-parallel collective array operations can permit high-level, effective programming of GPUs. I describe three optimizations for such a language: automatic use of GPU shared memory cache, […]

CUDA

Sep, 23

Mathematical limits of parallel computation for embedded systems

Embedded systems are designed to perform a specific set of tasks, and are frequently found in mobile, power-constrained environments. There is growing interest in the use of parallel computation as a means to increase performance while reducing power consumption. In this paper, we highlight fundamental limits to what can and cannot be improved by parallel […]

Sep, 23

HHT-based time-frequency analysis method for biomedical signal applications

Fourier transform, wavelet transformation, and Hilbert-Huang transformation (HHT) can be used to discuss the frequency characteristics of linear and stationary signals, the time-frequency features of linear and non-stationary signals, the time-frequency features of non-linear and non-stationary signals, respectively [1-6]. HHT is a combination of empirical mode decomposition (EMD) and Hilbert spectral analysis. EMD uses the […]

Sep, 23

The International Exascale Software Project roadmap

Over the last 20 years, the open-source community has provided more and more software on which the world’s high-performance computing systems depend for performance and productivity. The community has invested millions of dollars and years of effort to build key components. However, although the investments in these separate software elements have been tremendously valuable, a […]

Sep, 23

Compact data structure and scalable algorithms for the sparse grid technique

The sparse grid discretization technique enables a compressed representation of higher-dimensional functions. In its original form, it relies heavily on recursion and complex data structures, thus being far from well-suited for GPUs. In this paper, we describe optimizations that enable us to implement compression and decompression, the crucial sparse grid algorithms for our application, on […]

CUDA

Sep, 23

Colored stochastic shadow maps

This paper extends the stochastic transparency algorithm that models partial coverage to also model wavelength-varying transmission. It then applies this to the problem of casting shadows between any combination of opaque, colored transmissive, and partially covered (i.e., ?-matted) surfaces in a manner compatible with existing hardware shadow mapping techniques. Colored Stochastic Shadow Maps have a […]

Sep, 23

Unstructured grid applications on GPU: performance analysis and improvement

Performance of applications running on GPUs is mainly affected by hardware occupancy and global memory latency. Scientific applications that rely on analysis using unstructured grids could benefit from the high performance capabilities provided by GPUs, however, its memory access pattern and algorithm limit the potential benefits. In this paper we analyze the algorithm for unstructured […]

CUDA

Sep, 23

Orchestration by approximation: mapping stream programs onto multicore architectures

We present a novel 2-approximation algorithm for deploying stream graphs on multicore computers and a stream graph transformation that eliminates bottlenecks. The key technical insight is a data rate transfer model that enables the computation of a "closed form", i.e., the data rate transfer function of an actor depending on the arrival rate of the […]