11658

Posts

Jan, 19

GPU based Implementation of Film Flicker Reduction Algorithms

In this work we propose an algorithm for film restoration aimed at reducing the flicker effect while preserving the original overall illumination of the film. We also present a comparative study of the performance of this algorithm implemented following a sequential approach on a CPU and following a parallel approach on a GPU using OpenCL.
Jan, 14

Adaptation of an acoustic propagation model to the parallel architecture of a graphics processor

High performance underwater acoustic models are of great importance for enabling real-time acoustic source tracking, geoacoustic inversion, environmental monitoring and high-frequency underwater communications. Given the parallelizable nature of raytracing, in general, and of the ray superposition algorithm in particular, use of multiple computing units for the development of real-time efficient applications based on ray tracing […]
Jan, 14

High Performance Code Generation for Stencil Computation on Heterogeneous Multi-device Architectures

Heterogeneous architectures have been widely used in the domain of high performance computing. On one hand, it allows a designer to use multiple types of computing units and each able to execute the tasks that it is best suited for to increase performance; on the other hand, it brings many challenges in programming for novice […]
Jan, 14

Towards Portable Performance for Explicit Hydrodynamics Codes

Significantly increasing intra-node parallelism is widely recognised as being a key prerequisite for reaching exascale levels of computational performance. In future exascale systems it is likely that this performance improvement will be realised by increasing the parallelism available in traditional CPU devices and using massively-parallel hardware accelerators. The MPI programming model is starting to reach […]
Jan, 14

Parallelization and Optimization of Feature Detection Algorithms on Embedded GPU

In this paper, we parallelize and optimize the popular feature detection algorithms, i.e. SIFT and SURF, on the latest embedded GPU. Using conventional OpenGL shading language and recently developed OpenCL as the GPGPU software platforms, we compare the implementation efficiency and speed performance between each other as well as between GPU and CPU. Experimental result […]
Jan, 6

PySPH: A Python framework for SPH

We present an open source, object oriented framework for Smoothed Particle Hydrodynamics called PySPH. The framework is written in the high level, Python programming language and is designed to be user friendly, flexible and application agnostic. PySPH supports distributed memory computing using the message passing paradigm and (limited) shared memory like parallel processing on hybrid […]
Jan, 5

DEF-G: Declarative Framework for GPU Environment

DEF-G is a declarative language and framework for the efficient generation of OpenCL GPU applications. Using our proof-of-concept DEF-G implementation, run-time and lines-of-code comparisons are provided for three well-known algorithms (Sobel image filtering, breadth-first search and all-pairs shortest path), each evaluated on three different platforms. The DEF-G declarative language and corresponding OpenCL kernels generated complete […]
Jan, 2

Fast Parallel Image Registration on CPU and GPU for Diagnostic Classification of Alzheimer’s Disease

Nonrigid image registration is an important, but time-consuming task in medical image analysis. In typical neuroimaging studies, multiple image registrations are performed, i.e. for atlas-based segmentation or template construction. Faster image registration routines would therefore be beneficial. In this paper we explore acceleration of the image registration package elastix by a combination of several techniques: […]
Dec, 29

Multi-GPU numerical simulation of electromagnetic waves

In this paper we present three-dimensional numerical simulations of electromagnetic waves. The Maxwell equations are solved by the Discontinuous Galerkin (DG) method. For achieving high performance, we exploit two levels of parallelism. The coarse grain parallelism is managed through MPI and a classical domain decomposition. The fine grain parallelism is managed with OpenCL in order […]
Dec, 22

Speed-Up Improvement Using Parallel Approach in Image Steganography

This paper presents a parallel approach to improve the time complexity problem associated with sequential algorithms. An image steganography algorithm in transform domain is considered for implementation. Image steganography is a technique to hide secret message in an image. With the parallel implementation, large message can be hidden in large image since it does not […]
Dec, 20

Warps and Atomics: Beyond Barrier Synchronization in the Verification of GPU Kernels

We describe the design and implementation of methods to support reasoning about data races in GPU kernels where constructs other than the standard barrier primitive are used for synchronization. At one extreme we consider kernels that exploit implicit, coarse-grained synchronization between threads in the same warp, a feature provided by many architectures. At the other […]
Dec, 20

Pannotia: Understanding Irregular GPGPU Graph Applications

GPUs have become popular recently to accelerate general-purpose data-parallel applications. However, most existing work has focused on GPU-friendly applications with regular data structures and access patterns. While a few prior studies have shown that some irregular workloads can also achieve speedups on GPUs, this domain has not been investigated thoroughly. Graph applications are one such […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: