high performance computing on graphics processing units: hgpu.org

Posts

Aug, 2

DRiVE: An Example of Distributed Rendering in Virtual Environments

Most Virtual Reality (VR) applications use rendering methods which implement local illumination models, simulating only direct interaction of light with 3D objects. They do not take into account the energy exchange between the objects themselves, making the resulting images look non-optimal. The main reason for this is the simulation of global illumination having a high […]

CUDA

Aug, 2

Large-Scale Sound Field Rendering in Rectangular Room with Specular Reflection

The sound field rendering is a technique to compute the sound field from the three-dimensional numerical models constructed in the computer, and it is the same concept as the graphics rendering in the computer graphics. In this paper, a GPU (Graphics Processing Unit) cluster system is applied to the sound field rendering for a large […]

CUDA

Aug, 2

Comparing CUDA, OpenCL and OpenGL Implementations of the Cardiac Monodomain Equations

Computer simulations of cardiac electrophysiology are a helpful tool in the study of bioelectric activity of the heart. The cardiac monodomain model comprises a nonlinear system of partial differential equations and its numerical solution represents a very intensive computational task due to the required fine spatial and temporal resolution. Recent studies have shown that the […]

CUDA

•

OpenCL

•

OpenGL

Aug, 1

NOVA: A Functional Language for Data Parallelism

Functional languages provide a solid foundation on which complex optimization passes can be designed to exploit available parallelism in the underlying system. Their mathematical foundations enable high-level optimizations that would be impossible in traditional imperative languages. This makes them uniquely suited for generation of efficient target code for parallel systems, such as multiple Central Processing […]

CUDA

Aug, 1

Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism

In this big data era, the capability of mining and analyzing large scale datasets is imperative. As data are becoming more abundant than ever before, data driven methods are playing a critical role in areas such as decision support and business intelligence. In this paper, we demonstrate how state-of-the-art GPUs and the Dynamic Parallelism feature […]

CUDA

Aug, 1

A note on the GPU acceleration of eigenvalue computations

Eigenvalue computations for large sparse matrices such as the Lanczos method are commonly based on Krylov subspace techniques. One of the dominant operations in such algorithms are iterated computations of inner products with the same vector in order to preserve orthogonality of the Krylov basis. These operations can be accelerated by existing BLAS functionality using […]

CUDA

•

OpenCL

Aug, 1

Matrix Convolution using Parallel Programming

The convolution theorem is used to multiply matrices of two different sizes i.e. matrices in which the number of rows in the first matrix is not equal to the number of columns in the second matrix. In this study, the multiplication of 3*3 and 4*4 matrices was done using MPI. A 3*3 matrix was taken […]

OpenCL

Aug, 1

GPU peer-to-peer techniques applied to a cluster interconnect

Modern GPUs support special protocols to exchange data directly across the PCI Express bus. While these protocols could be used to reduce GPU data transmission times, basically by avoiding staging to host memory, they require specific hardware features which are not available on current generation network adapters. In this paper we describe the architectural modifications […]

CUDA

Jul, 31

Iterative CT Reconstruction on the GPU

The computing power of modern GPUs makes them very suitable for Computed Tomography (CT) image reconstruction. Apart from accelerating the reconstruction, their extra computing performance compared to conventional CPUs can be used to increase image quality in several ways. In this paper we present our upgraded GPU based iterative reconstruction algorithm, including ML-TR (Maximum Likelihood […]

CUDA

Jul, 31

The Promises of Hybrid Hexagonal/Classical Tiling for GPU

Time-tiling is necessary for efficient execution of iterative stencil computations. But the usual hyper-rectangular tiles cannot be used because of positive/negative dependence distances along the stencil’s spatial dimensions. Several prior efforts have addressed this issue. However, known techniques trade enhanced data reuse for other causes of inefficiency, such as unbalanced parallelism, redundant computations, or increased […]

CUDA

Jul, 31

Opportunities for Heterogeneous CPUGPU Task Scheduling

It is common to exploit the co-processors of modern computer systems to speed up computations which were traditionally done on the CPU. While this is already very common for computer graphical and scientific applications, there is no reason why this cannot be extended to many different kinds of applications. In this paper we study the […]

OpenCL

Jul, 31

GPU-based Streaming Algorithm for High-Resolution Cloth Simulation

We present a GPU-based streaming algorithm to perform high-resolution and accurate cloth simulation. We map all the components of cloth simulation pipeline, including time integration, collision detection, collision response, and velocity updating to GPU-based kernels and data structures. Our algorithm perform intra-object and inter-object collisions, handles contacts and friction, and is able to accurately simulate […]

CUDA