high performance computing on graphics processing units: hgpu.org

Posts

Jan, 8

Implementation of Kd-Trees on the GPU to Achieve Real Time Graphics Processing

This paper examines the parallelization of ray tracing algorithms with the goal of running the whole process on the graphics processing unit (GPU) rather than the central processing unit (CPU). The motivation behind this endeavour is to utilize the massively parallel nature of the GPU. This parallelism allows the construction of 3-dimensional images to take […]

Jan, 8

A Highly Efficient GPU-CPU Hybrid Parallel Implementation of Sparse LU Factorization

In this paper, we try to accelerate sparse LU factorization on GPU. We present a tiled storage format and a parallel algorithm to improve the memory access pattern, and a register blocking method to compress the on-chip working set. The OPENMP implementation of our algorithm gives more stable performance over different matrices, and outperforms SuperLU […]

CUDA

Jan, 8

Cryptanalysis of the Full AES Using GPU-Like Special-Purpose Hardware

The block cipher Rijndael has undergone more than ten years of extensive cryptanalysis since its submission as a candidate for the Advanced Encryption Standard (AES) in April 1998. To date, most of the publicly-known cryptanalytic results are based on reduced-round variants of the AES (respectively Rijndael) algorithm. Among the few exceptions that target the full […]

CUDA

Jan, 7

Report on the Feasibility of Implementing PIC Codes on a GPU

GPUs have become a very attractive supplement to traditional high performance computing. GPUs have significantly better performance per cost and power consumption. However, GPUs introduce several additional levels of parallelism that must be contended with. New methods must be developed in order to take full advantage of the capabilities of this architecture. This paper explores […]

CUDA

Jan, 7

Fat versus Thin Threading Approach on GPUs: Application to Stochastic Simulation of Chemical Reactions

We explore two different threading approaches on a graphics processing unit (GPU) exploiting two different characteristics of the current GPU architecture. The fat thread approach tries to minimize data access time by relying on shared memory and registers potentially sacrificing parallelism. The thin thread approach maximizes parallelism and tries to hide access latencies. We apply […]

CUDA

Jan, 7

Efficient Parallel Graph Exploration on Multi-Core CPU and GPU

Graphs are a fundamental data representation that has been used extensively in various domains. In graph-based applications, a systematic exploration of the graph such as a breadth-first search (BFS) often serves as a key component in the processing of their massive data sets. In this paper, we present a new method for implementing the parallel […]

CUDA

Jan, 7

Interactive rendering of acquired materials on dynamic geometry using bandwidth prediction

Shading complex materials such as acquired reflectances in multi-light environments is computationally expensive. Estimating the shading integral involves sampling the incident illumination independently at several pixels. The number of samples required for this integration varies across the image, depending on an intricate combination of several factors. Adaptively distributing computational budget across the pixels for shading […]

Jan, 7

On-the-Fly Computing on Many-Core Processors in Nuclear Applications

Many nuclear applications still require more computational power than the current computers can provide. Furthermore, some of them require dedicated machines, because they must run constantly or no delay is allowed. To satisfy these requirements, we introduce computer accelerators which can provide higher computational power with lower prices than the current commodity processors. However, the […]

Jan, 7

Graduate Operating Systems: Project Report

Due to the high demand for secure Internet usage, an improvement of the SSL performance is needed. This paper describes a technique to improve the performance of SSL by creating a CPU/GPU hybrid proxy to sit in front of a web server to only handle the SSL overheads. This will allow the utilization of high […]

OpenCL

Jan, 7

Multiphase Fluid Simulations on a Multiple GPGPU PC Using Unsplit Time Integration VSIAM3

This talk presents the implementation of simulations on multiphase fluid dynamics on hardware of multiple GPGPU architecture by using robust and efficient numerical methods. An unsplit formulation for the advection computation is proposed to take the place of the original split formulation in the so-called VSIAM3 method. The new formulation improves dimensional symmetry of numerical […]

OpenCL

Jan, 6

GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement

In hardware-aware high performance computing, block- asynchronous iteration and mixed precision iterative refinement are two techniques that are applied to leverage the computing power of SIMD accelerators like GPUs. Although they use a very different approach for this purpose, they share the basic idea of compensating the convergence behaviour of an inferior numerical algorithm by […]

CUDA

Jan, 6

Artifact-Free JPEG Decompression with Total Generalized Variation

We propose a new model for the improved reconstruction of JPEG (Joint Photographic Experts Group) images. Given a JPEG compressed image, our method first determines the set of possible source images and then specifically chooses one of these source images satisfying additional regularity properties. This is realized by employing the recently introduced Total Generalized Variation […]

CUDA