11068

Posts

Dec, 6

Using Shared Memory as a Cache in Cellular Automata Water Flow Simulations on GPUs

Graphics processors (GPU – Graphic Processor Units) recently have gained a lot of interest as an efficient platform for general-purpose computation. Cellular Automata approach which is inherently parallel gives the opportunity to implement high performance simulations. This paper presents how shared memory in GPU can be used to improve performance for Cellular Automata models. In […]
Dec, 6

Applications of Many-Core Technologies to On-line Event Reconstruction in High Energy Physics Experiments

Interest in many-core architectures applied to real time selections is growing in High Energy Physics (HEP) experiments. In this paper we describe performance measurements of many-core devices when applied to a typical HEP online task: the selection of events based on the trajectories of charged particles. We use as benchmark a scaled-up version of the […]
Dec, 6

Fast quantum Monte Carlo on a GPU

We present a scheme for the parallelization of quantum Monte Carlo on graphical processing units, focusing on bosonic systems and variational Monte Carlo. We use asynchronous execution schemes with shared memory persistence, and obtain an excellent acceleration. Comparing with single core execution, GPU-accelerated code runs over x100 faster. The CUDA code is provided along with […]
Dec, 4

Computing OpenSURF on OpenCL and General Purpose GPU

Speeded-Up Robust Feature (SURF) algorithm is widely used for image feature detecting and matching in computer vision area. Open Computing Language (OpenCL) is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. This paper introduces how to implement an open-sourced SURF program, namely OpenSURF, on general purpose […]
Dec, 4

GPU-based Multi-stream Analyzer on Application Layer for Service-oriented Router

Service-oriented router (SoR) is a new router architecture for providing rich services to Internet users by utilizing useful information extracted from network traffic. In SoR, stream reconstruction and selection is a fundamental process for providing the services in the application layer. After real-time reconstruction of stream data, SoR used a software character string analyzer to […]
Dec, 4

Exploiting Heterogeneous Systems: Keccak on OpenCL

Using graphics processing units (GPUs) in high-performance parallel computing continues to become more prevalent, often as part of a heterogeneous system. CUDA and OpenCL are APIs and enables programmers to developer GPGPU applications and softwares to massively parallel processors. In October 2, 2012, NIST announced the winner of its five-year competition to select a new […]
Dec, 4

Cluster-SkePU: A Multi-Backend Skeleton Programming Library for GPU Clusters

SkePU is a C++ template library with a simple and unified interface for expressing data parallel computations in terms of generic components, called skeletons, on multi-GPU systems using CUDA and OpenCL. The smart containers in SkePU, such as Matrix and Vector, perform data management with a lazy memory copying mechanism that reduces redundant data communication. […]
Dec, 4

A Hybrid Approach to Parallel Connected Component Labeling Using CUDA

Connected component labeling (CCL) is a mandatory step in image segmentation where each object in an image is identified and uniquely labeled. Sequential CCL is a time-consuming operation and thus is often implemented within parallel processing framework to reduce execution time. Several parallel CCL methods have been proposed in the literature. Among them are NSZ […]
Dec, 4

Optimizing Data Locality for Iterative Matrix Solvers on CUDA

Solving systems of linear equations is an important problem that spans almost all fields of science and mathematics. When these systems grow in size, iterative methods are used to solve these problems. This paper looks at optimizing these methods for CUDA Architectures. It discusses a multi-threaded CPU implementation, a GPU implementation, and a data optimized […]
Dec, 4

Comparative Study of High Performance Computing Using Multi-core Parallel Systems

Multi-core based high performance computing systems are available with a reasonable price. Parallel programming paradigm needs to be adjusted to an individual system. Parallel computing systems were compared in this paper. Electroencephalography signals were collected in order to measure performance of parallel computing for CPU and GPU based systems. A CPU based system showed better […]
Dec, 4

HSPA+/LTE-A Turbo Decoder on GPU and Multicore CPU

This paper compares two implementations of reconfigurable and high-throughput turbo decoders. The first implementation is optimized for an NVIDIA Kepler graphics processing unit (GPU), whereas the second implementation is for an Intel Ivy Bridge processor. Both implementations support max-log-MAP and log-MAP turbo decoding algorithms, various code rates, different interleaver types, and all block-lengths, as specified […]
Dec, 4

Divergence Analysis

The growing interest in graphics processing units has brought renewed attention to the Single Instruction Multiple Data (SIMD) execution model. SIMD machines give application developers tremendous computational power; however, programming them is still challenging. In particular, developers must deal with memory and control flow divergences. These phenomena stem from a condition that we call data […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: