10257

Posts

Aug, 1

Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism

In this big data era, the capability of mining and analyzing large scale datasets is imperative. As data are becoming more abundant than ever before, data driven methods are playing a critical role in areas such as decision support and business intelligence. In this paper, we demonstrate how state-of-the-art GPUs and the Dynamic Parallelism feature […]
Aug, 1

A note on the GPU acceleration of eigenvalue computations

Eigenvalue computations for large sparse matrices such as the Lanczos method are commonly based on Krylov subspace techniques. One of the dominant operations in such algorithms are iterated computations of inner products with the same vector in order to preserve orthogonality of the Krylov basis. These operations can be accelerated by existing BLAS functionality using […]
Aug, 1

Matrix Convolution using Parallel Programming

The convolution theorem is used to multiply matrices of two different sizes i.e. matrices in which the number of rows in the first matrix is not equal to the number of columns in the second matrix. In this study, the multiplication of 3*3 and 4*4 matrices was done using MPI. A 3*3 matrix was taken […]
Aug, 1

GPU peer-to-peer techniques applied to a cluster interconnect

Modern GPUs support special protocols to exchange data directly across the PCI Express bus. While these protocols could be used to reduce GPU data transmission times, basically by avoiding staging to host memory, they require specific hardware features which are not available on current generation network adapters. In this paper we describe the architectural modifications […]
Jul, 31

Iterative CT Reconstruction on the GPU

The computing power of modern GPUs makes them very suitable for Computed Tomography (CT) image reconstruction. Apart from accelerating the reconstruction, their extra computing performance compared to conventional CPUs can be used to increase image quality in several ways. In this paper we present our upgraded GPU based iterative reconstruction algorithm, including ML-TR (Maximum Likelihood […]
Jul, 31

The Promises of Hybrid Hexagonal/Classical Tiling for GPU

Time-tiling is necessary for efficient execution of iterative stencil computations. But the usual hyper-rectangular tiles cannot be used because of positive/negative dependence distances along the stencil’s spatial dimensions. Several prior efforts have addressed this issue. However, known techniques trade enhanced data reuse for other causes of inefficiency, such as unbalanced parallelism, redundant computations, or increased […]
Jul, 31

Opportunities for Heterogeneous CPUGPU Task Scheduling

It is common to exploit the co-processors of modern computer systems to speed up computations which were traditionally done on the CPU. While this is already very common for computer graphical and scientific applications, there is no reason why this cannot be extended to many different kinds of applications. In this paper we study the […]
Jul, 31

GPU-based Streaming Algorithm for High-Resolution Cloth Simulation

We present a GPU-based streaming algorithm to perform high-resolution and accurate cloth simulation. We map all the components of cloth simulation pipeline, including time integration, collision detection, collision response, and velocity updating to GPU-based kernels and data structures. Our algorithm perform intra-object and inter-object collisions, handles contacts and friction, and is able to accurately simulate […]
Jul, 31

Fast Computation of Computer-generated Hologram Using Xeon Phi Coprocessors

Using parallel computing is an effective way to accelerate computer-generated hologram (CGH) calculation. In this paper, we implemented various CGH algorithms on Intel Xeon Phi Coprocessors. In the best case, we succeeded the CGH calculations 12-times faster than a CPU.
Jul, 31

Graphics Processing Unit acceleration of the Random Phase Approximation in the projector augmented wave method

The Random Phase Approximation (RPA) for correlation energy in the grid-based projector augmented wave (gpaw) code is accelerated by porting to the Graphics Processing Unit (GPU) architecture. The acceleration is achieved by grouping independent vectors/matrices and transforming the implementation from being memory bound to being computation/latency bound. With this approach, both the CPU and GPU […]
Jul, 30

Image Processing with CUDA

This thesis puts to the test the power of parallel computing on the GPU against the massive computations needed in image processing of large images. The GPU has long been used to accelerate 3D applications. With the advent of high level programmable interfaces, programming to the GPU is simplified and is being used to accelerate […]
Jul, 30

Domain Specific Languages for High Performance Computing

High Performance Computing (HPC) relies completely on complex parallel, heterogeneous architectures and distributed systems which are hard and error-prone to exploit, even for HPC specialists. Further and further knowledge on runtime systems, dependency tracking, memory transaction optimization and many other techniques are a must-have requirement to produce high quality software capable of exploiting every single […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: