9173

Posts

Mar, 26

General Purpose Computation on Graphics Processing Units Using OpenCL

Computational Science has emerged as a third pillar of science along with theory and experiment, where the parallelization for scientific computing is promised by different shared and distributed memory architectures such as, super-computer systems, grid and cluster based systems, multi-core and multiprocessor systems etc. In the recent years the use of GPUs (Graphic Processing Units) […]
Mar, 26

Improving Performance Portability in OpenCL Programs

We study the performance portability of OpenCL across diverse architectures including NVIDIA GPU, Intel Ivy Bridge CPU, and AMD Fusion APU. We present detailed performance analysis at assembly level on three exemplar OpenCL benchmarks: SGEMM, SpMV, and FFT. We also identify a number of tuning knobs that are critical to performance portability, including threads-data mapping, […]
Mar, 26

Creating Optimal Code for GPU-Accelerated CT Reconstruction Using Ant Colony Optimization

PURPOSE: CT reconstruction algorithms implemented on the GPU are highly sensitive to their implementation details and the hardware they run on. Fine-tuning an implementation for optimal performance can be a time consuming task and require many updates when the hardware changes. There are some techniques that do automatic fine-tuning of GPU code. These techniques, however, […]
Mar, 26

A Region Growing Segmentation Algorithm for GPUs

This paper proposes a parallel region growing image segmentation algorithm for Graphics Processing Units (GPU). It is inspired in a sequential algorithm widely used by the Geographic Object Based Image Analysis (GEOBIA) community. Initially, all image pixels are considered as seeds or primitive segments. Fine grained parallel threads assigned to individual pixels merge adjacent segments […]
Mar, 26

An application of graphical numerical accelerators in simulations of ion-transport through biological membranes

The modeling of ion-transport through biological membranes is important for understanding many life processes. The transmembrane potential and ion concentrations in the stationary state can be measured in in-vivo experiments. They can also be simulated within membrane models. Here we consider a basic model of ion transport that describes the time evolution of ion concentrations […]
Mar, 26

Experimental Fault-Tolerant Synchronization for Reliable Computation on Graphics Processors

Graphics processors (GPUs) are emerging as a promising platform for highly parallel, compute-intensive, general-purpose computations, which usually need support for inter-process synchronization. Using the traditional lock-based synchronization (e.g. mutual exclusion) makes the computation vulnerable to faults caused by both scientists’ inexperience and hardware transient errors. It is notoriously difficult for scientists to deal with deadlocks […]
Mar, 26

Hyperspectral Unmixing on GPUs and Multi-Core Processors: A Comparison

One of the main problems in the analysis of remotely sensed hyperspectral data cubes is the presence of mixed pixels, which arise when the spatial resolution of the sensor is not able to separate spectrally distinct materials. Due to this reason, spectral unmixing has become one of the most important tasks for hyperspectral data exploitation. […]
Mar, 24

Parallel Interpretation of L-system Based on CUDA

When L-systems are applied to large and detailed 3D objects, the inherent serial geometric interpretation limits the speed of image generation. To accelerate the interpreting procedure, a Graphic Processing Unit (GPU) based method utilizing Compute Unified Device Architecture (CUDA) is proposed in this paper. The focused approach involves two phases: first is a sequential scan […]
Mar, 24

A Dynamic IP Lookup Architecture using Parallel Multiple Hash in GPU-based Software Router

As the fiber propagation velocity grows and the routing scale expands, IP lookup speed becomes the major bottleneck of high-performance network. Its efficiency directly determines the throughput of the entire routing channel. Recently, Graphics Processing Units (GPUs), highly parallel, flexibility for program and low price, is widely adopted in different areas including software router. In […]
Mar, 24

Recurrence quantification analysis in images with CUDA

This work has as first goal develop an parallel algorithm considerably faster than it serial pair. The algorithm is based on recurrence plots idea, that is a square matrix where it is possible detect many complex behaviours (e.g. with the recurrence quantification analysis technique) of dynamical systems. In the results and conclusion part, we will […]
Mar, 24

Efficiently Mapping the AES Encryption Algorithm on GPUs

Warped AES is a high performance heterogeneous GPU/CPU-SSE parallel method for encryption using GPUs. Considering the performance of encryption in GPU memory alone, our algorithm outperforms current published implementations on comparable hardware. In our ongoing research, we have also devised a speculative method for high throughput encryption on GPUs, while preserving low latency to client […]
Mar, 24

How to Benefit from AMD, Intel and Nvidia Accelerator Technologies in Scilab

This paper presents how to use, in Scilab, accelerator technologies from AMD, Intel and Nvidia in a exible and portable manner. The proposed approach aims at simplifying speeding up Scilab programs in an incremental process thanks to directives based parallel programming provided by CAPS OpenHMPP technology.

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: