3371

Posts

Mar, 17

Implementation of algorithms with a fine-grained parallelism on GPUs

The efficiency of implementations of algorithms with a fine-grained parallelism on GPUs that support the CUDA architecture is studied. Cellular automata and difference schemes are used for testing. Several versions of implementations are proposed and their efficiency is analyzed. An example of GPU application for modeling the process of carbon dioxide oxidation on the catalyst […]
Mar, 17

RDMA-Based Job Migration Framework for MPI over InfiniBand

Coordinated checkpoint and recovery is a common approach to achieve fault tolerance on large-scale systems. The traditional mechanism dumps the process image to a local disk or a central storage area of all the processes involved in the parallel job. When a failure occurs, the processes are restarted and restored to the latest checkpoint image. […]
Mar, 17

Live, Video-Rate Super-Resolution Microscopy Using Structured Illumination and Rapid GPU-Based Parallel Processing

Structured illumination fluorescence microscopy is a powerful super-resolution method that is capable of achieving a resolution below 100 nm. Each super-resolution image is computationally constructed from a set of differentially illuminated images. However, real-time application of structured illumination microscopy (SIM) has generally been limited due to the computational overhead needed to generate super-resolution images. Here, […]
Mar, 17

Performance analysis of single-phase, multiphase, and multicomponent lattice-Boltzmann fluid flow simulations on GPU clusters

The lattice-Boltzmann method is well suited for implementation in single-instruction multiple-data (SIMD) environments provided by general purpose graphics processing units (GPGPUs). This paper discusses the integration of these GPGPU programs with OpenMP to create lattice-Boltzmann applications for multi-GPU clusters. In addition to the standard single-phase single-component lattice-Boltzmann method, the performances of more complex multiphase, multicomponent […]
Mar, 17

Memory-Scalable GPU Spatial Hierarchy Construction

Recent GPU algorithms for constructing spatial hierarchies achieve promising performance for moderately complex models by using the BFS (breadth-first search) construction order. While being able to exploit the massive parallelism on the GPU, the BFS order consumes excessive GPU memory, which becomes a serious issue. In this paper, we propose to use the PBFS (partial […]
Mar, 17

CUDA Compatible GPU as an Efficient Hardware Accelerator for AES Cryptography

This paper presents a study of the efficiency in applying modern Graphics Processing Units in symmetric key cryptographic solutions. It describes both traditional style approaches based on the OpenGL graphics API and new ones based on the recent technology trends of major hardware vendors. It presents an efficient implementation of the Advanced Encryption Standard (AES) […]
Mar, 17

High-Throughput Transaction Executions on Graphics Processors

OLTP (On-Line Transaction Processing) is an important business system sector in various traditional and emerging online services. Due to the increasing number of users, OLTP systems require high throughput for executing tens of thousands of transactions in a short time period. Encouraged by the recent success of GPGPU (General-Purpose computation on Graphics Processors), we propose […]
Mar, 16

Mutual information computation and maximization using GPU

We present a GPU implementation to compute both mutual information and its derivatives. Mutual information computation is a highly demanding process due to the enormous number of exponential computations. It is therefore the bottleneck in many image registration applications. However, we show that these computations are fully parallizable and can be efficiently ported onto the […]
Mar, 16

Direct evaluation of NURBS curves and surfaces on the GPU

This paper presents a new method to evaluate and display trimmed NURBS surfaces using the Graphics Processing Unit (GPU). Trimmed NURBS surfaces, the de facto standard in commercial 3D CAD modeling packages, are currently tessellated into triangles before being sent to the graphics card for display since there is no native hardware support for NURBS. […]
Mar, 16

Alignment invariant image comparison implemented on the GPU

This paper proposes a GPU implemented algorithm to determine the differences between two binary images using Distance Transformations. These differences are invariant to slight rotation and offsets, making the technique ideal for comparisons between images that are not perfectly aligned. The parallel processing capabilities of the GPU allows for faster implementation than on traditional desktop […]
Mar, 16

Implementing the Himeno benchmark with CUDA on GPU clusters

This paper describes the use of CUDA to accelerate the Himeno benchmark on clusters with GPUs. The implementation is designed to optimize memory bandwidth utilization. Our approach achieves over 83% of the theoretical peak bandwidth on a NVIDIA Tesla C1060 GPU and performs at over 50 GFlops. A multi-GPU implementation that utilizes MPI alongside CUDA […]
Mar, 16

Rotationally invariant sparse patch matching on GPU and FPGA

Vector and data-flow processors are particularly strong at dense, regular computation. Sparse, irregular data layouts cause problems because their unpredictable data access patterns prevent computational pipelines from filling effectively. A number of algorithms in image processing have been proposed which are not dense, and instead apply local neighborhood operations to a sparse, irregular set of […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: