13466

Posts

Feb, 6

Comparison of OpenCL performance on different platforms using VexCL and Blaze

This technical report provides performance numbers for several benchmark problems running on several different hardware platforms. The goal of this report is twofold. First, it helps us better understand how the performance of OpenCL changes on different platforms. Second, it provides a OpenCL-OpenMP comparison for a sparse matrix-vector multiplication operation. The VexCL library will be […]
Feb, 6

Register-leaning kernels in CUDA

Kepler cards offer a giant amount of register space. One can use this memory to store working data arrays, just as one uses the shared memory. This white paper will describe such register-leaning approach in detail.
Feb, 3

A Survey of Power Management Techniques for Phase Change Memory

The demands of larger memory capacity in high-performance computing systems have motivated the researchers to explore alternatives of DRAM (dynamic random access memory). Since PCM (phase change memory) provides high-density, good scalability and non-volatile data storage, it has received significant amount of attention in recent years. A crucial bottleneck in wide-spread adoption of PCM, however, […]
Feb, 3

Pointer Analysis for Semi-Automatic Code Parallelizers

Code parallelizers are employed these days to reduce the efforts needed in manually parallelizing sequential code. But they are ineffective when it comes to handling programming constructs like pointers. Code parallelizers like Par4all have a limited support for pointers while approaches like the ASET + BONES cannot handle pointers at all. In this thesis we […]
Feb, 3

Exploiting Concurrency Patterns with Heterogeneous Task and Data Parallelism

Parallel programming of an application requires not only domain knowledge of the application, but also programming environment support and in-depth awareness of the target architecture. Often, all concurrency features of the architecture are not exposed to the programming environment. The challenge lies in efficient utilization of these unexposed features to write effective parallel programs. In […]
Feb, 3

GPGPU and MIC in Accelerated Cluster for Remote Sensed Image Processing Software

Processing of Earth observation remotely sensed images requires more and more powerful computing facilities. Since a few years, GPGPU (General Purpose processing on Graphics Processing Units) technology has been used to perform massively parallel calculations. The French Space Agency (CNES) has then made a portage of some IAS to assess their performance using this type […]
Feb, 3

On the Accelerating of Two-dimensional Smart Laplacian Smoothing on the GPU

This paper presents a GPU-accelerated implementation of two-dimensional Smart Laplacian smoothing. This implementation is developed under the guideline of our paradigm for accelerating Laplacianbased mesh smoothing [13]. Two types of commonly used data layouts, Array-of-Structures (AoS) and Structure-of-Arrays (SoA) are used to represent triangular meshes in our implementation. Two iteration forms that have different choices […]
Feb, 3

Scaling Recurrent Neural Network Language Models

This paper investigates the scaling properties of Recurrent Neural Network Language Models (RNNLMs). We discuss how to train very large RNNs on GPUs and address the questions of how RNNLMs scale with respect to model size, training-set size, computational costs and memory. Our analysis shows that despite being more costly to train, RNNLMs obtain much […]
Feb, 2

Multi-GPU Support on Shared Memory System using Directive-based Programming Model

Existing and emerging studies show that using single Graphics Processing Units (GPUs) can lead to obtaining significant performance gains. These devices have tremendous processing capabilities. We should be able to achieve further orders of performance speedup if we use more than just one GPU. Heterogeneous processors consisting of multiple CPUs and GPUs offer immense potential […]
Feb, 2

Characterizing and Enhancing Global Memory Data Coalescing on GPUs

Effective parallel programming for GPUs requires careful attention to several factors, including ensuring coalesced access of data from global memory. There is a need for tools that can provide feedback to users about statements in a GPU kernel where non-coalesced data access occurs, and assistance in fixing the problem. In this paper, we address both […]
Feb, 2

Performance Analysis and Optimization of Hermite Methods on NVIDIA GPUs Using CUDA

In this thesis we present the first, to our knowledge, implementation and performance analysis of Hermite methods on GPU accelerated systems. We give analytic background for Hermite methods; give implementations of the Hermite methods on traditional CPU systems as well as on GPUs; give the reader background on basic CUDA programming for GPUs; discuss performance […]
Feb, 2

Reliable Initialization of GPU-enabled Parallel Stochastic Simulations Using Mersenne Twister for Graphics Processors

Parallel stochastic simulations tend to exploit more and more computing power and they are now also developed for General Purpose Graphics Process Units (GP-GPUs). Consequently, they need reliable random sources to feed their applications. We propose a survey of the current Pseudo Random Numbers Generators (PRNG) available on GPU. We give a particular focus to […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: