10255

Posts

Aug, 1

Matrix Convolution using Parallel Programming

The convolution theorem is used to multiply matrices of two different sizes i.e. matrices in which the number of rows in the first matrix is not equal to the number of columns in the second matrix. In this study, the multiplication of 3*3 and 4*4 matrices was done using MPI. A 3*3 matrix was taken […]
Aug, 1

GPU peer-to-peer techniques applied to a cluster interconnect

Modern GPUs support special protocols to exchange data directly across the PCI Express bus. While these protocols could be used to reduce GPU data transmission times, basically by avoiding staging to host memory, they require specific hardware features which are not available on current generation network adapters. In this paper we describe the architectural modifications […]
Jul, 31

Iterative CT Reconstruction on the GPU

The computing power of modern GPUs makes them very suitable for Computed Tomography (CT) image reconstruction. Apart from accelerating the reconstruction, their extra computing performance compared to conventional CPUs can be used to increase image quality in several ways. In this paper we present our upgraded GPU based iterative reconstruction algorithm, including ML-TR (Maximum Likelihood […]
Jul, 31

The Promises of Hybrid Hexagonal/Classical Tiling for GPU

Time-tiling is necessary for efficient execution of iterative stencil computations. But the usual hyper-rectangular tiles cannot be used because of positive/negative dependence distances along the stencil’s spatial dimensions. Several prior efforts have addressed this issue. However, known techniques trade enhanced data reuse for other causes of inefficiency, such as unbalanced parallelism, redundant computations, or increased […]
Jul, 31

Opportunities for Heterogeneous CPUGPU Task Scheduling

It is common to exploit the co-processors of modern computer systems to speed up computations which were traditionally done on the CPU. While this is already very common for computer graphical and scientific applications, there is no reason why this cannot be extended to many different kinds of applications. In this paper we study the […]
Jul, 31

GPU-based Streaming Algorithm for High-Resolution Cloth Simulation

We present a GPU-based streaming algorithm to perform high-resolution and accurate cloth simulation. We map all the components of cloth simulation pipeline, including time integration, collision detection, collision response, and velocity updating to GPU-based kernels and data structures. Our algorithm perform intra-object and inter-object collisions, handles contacts and friction, and is able to accurately simulate […]
Jul, 31

Fast Computation of Computer-generated Hologram Using Xeon Phi Coprocessors

Using parallel computing is an effective way to accelerate computer-generated hologram (CGH) calculation. In this paper, we implemented various CGH algorithms on Intel Xeon Phi Coprocessors. In the best case, we succeeded the CGH calculations 12-times faster than a CPU.
Jul, 31

Graphics Processing Unit acceleration of the Random Phase Approximation in the projector augmented wave method

The Random Phase Approximation (RPA) for correlation energy in the grid-based projector augmented wave (gpaw) code is accelerated by porting to the Graphics Processing Unit (GPU) architecture. The acceleration is achieved by grouping independent vectors/matrices and transforming the implementation from being memory bound to being computation/latency bound. With this approach, both the CPU and GPU […]
Jul, 30

Image Processing with CUDA

This thesis puts to the test the power of parallel computing on the GPU against the massive computations needed in image processing of large images. The GPU has long been used to accelerate 3D applications. With the advent of high level programmable interfaces, programming to the GPU is simplified and is being used to accelerate […]
Jul, 30

Domain Specific Languages for High Performance Computing

High Performance Computing (HPC) relies completely on complex parallel, heterogeneous architectures and distributed systems which are hard and error-prone to exploit, even for HPC specialists. Further and further knowledge on runtime systems, dependency tracking, memory transaction optimization and many other techniques are a must-have requirement to produce high quality software capable of exploiting every single […]
Jul, 30

Counting and Occurrence Sort for GPUs using an Embedded Language

This paper investigates two sorting algorithms: counting sort and a variation, occurrence sort, which also removes duplicate elements, and examines their suitability for running on the GPU. The duplicate removing variation turns out to have a natural functional, dataparallel implementation which makes it particularly interesting for GPUs. The algorithms are implemented in Obsidian, a high-level […]
Jul, 30

Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi

This paper presents the design and implementation of several fundamental dense linear algebra (DLA) algorithms for multicore with Intel Xeon Phi Coprocessors. In particular, we consider algorithms for solving linear systems. Further, we give an overview of the MAGMA MIC library, an open source, high performance library that incorporates the developments presented, and in general […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: