Posts
Feb, 3
GPGPU and MIC in Accelerated Cluster for Remote Sensed Image Processing Software
Processing of Earth observation remotely sensed images requires more and more powerful computing facilities. Since a few years, GPGPU (General Purpose processing on Graphics Processing Units) technology has been used to perform massively parallel calculations. The French Space Agency (CNES) has then made a portage of some IAS to assess their performance using this type […]
Feb, 3
On the Accelerating of Two-dimensional Smart Laplacian Smoothing on the GPU
This paper presents a GPU-accelerated implementation of two-dimensional Smart Laplacian smoothing. This implementation is developed under the guideline of our paradigm for accelerating Laplacianbased mesh smoothing [13]. Two types of commonly used data layouts, Array-of-Structures (AoS) and Structure-of-Arrays (SoA) are used to represent triangular meshes in our implementation. Two iteration forms that have different choices […]
Feb, 3
Scaling Recurrent Neural Network Language Models
This paper investigates the scaling properties of Recurrent Neural Network Language Models (RNNLMs). We discuss how to train very large RNNs on GPUs and address the questions of how RNNLMs scale with respect to model size, training-set size, computational costs and memory. Our analysis shows that despite being more costly to train, RNNLMs obtain much […]
Feb, 2
Multi-GPU Support on Shared Memory System using Directive-based Programming Model
Existing and emerging studies show that using single Graphics Processing Units (GPUs) can lead to obtaining significant performance gains. These devices have tremendous processing capabilities. We should be able to achieve further orders of performance speedup if we use more than just one GPU. Heterogeneous processors consisting of multiple CPUs and GPUs offer immense potential […]
Feb, 2
Characterizing and Enhancing Global Memory Data Coalescing on GPUs
Effective parallel programming for GPUs requires careful attention to several factors, including ensuring coalesced access of data from global memory. There is a need for tools that can provide feedback to users about statements in a GPU kernel where non-coalesced data access occurs, and assistance in fixing the problem. In this paper, we address both […]
Feb, 2
Performance Analysis and Optimization of Hermite Methods on NVIDIA GPUs Using CUDA
In this thesis we present the first, to our knowledge, implementation and performance analysis of Hermite methods on GPU accelerated systems. We give analytic background for Hermite methods; give implementations of the Hermite methods on traditional CPU systems as well as on GPUs; give the reader background on basic CUDA programming for GPUs; discuss performance […]
Feb, 2
Reliable Initialization of GPU-enabled Parallel Stochastic Simulations Using Mersenne Twister for Graphics Processors
Parallel stochastic simulations tend to exploit more and more computing power and they are now also developed for General Purpose Graphics Process Units (GP-GPUs). Consequently, they need reliable random sources to feed their applications. We propose a survey of the current Pseudo Random Numbers Generators (PRNG) available on GPU. We give a particular focus to […]
Feb, 2
Locality-aware parallel block-sparse matrix-matrix multiplication using the Chunks and Tasks programming model
We present a library for parallel block-sparse matrix-matrix multiplication on distributed memory clusters. The library is based on the Chunks and Tasks programming model [Parallel Comput. 40, 328 (2014)]. Acting as matrix library developers, using this model we do not have to explicitly deal with distribution of work and data or communication between computational nodes […]
Feb, 2
Montblanc: GPU accelerated Radio Interferometer Measurement Equations in support of Bayesian Inference for Radio Observations
We present Montblanc, a GPU implementation of the Radio interferometer measurement equation (RIME) in support of the Bayesian inference for radio observations (BIRO) technique. BIRO uses Bayesian inference to select sky models that best match the visibilities observed by a radio interferometer. To accomplish this, BIRO evaluates the RIME multiple times, varying sky model parameters […]
Feb, 1
Optimized Data Transfers Based on the OpenCL Event Management Mechanism
In standard OpenCL programming, hosts such as CPUs are supposed to control their compute devices such as GPUs. Since compute devices are dedicated to kernel computation, only hosts can execute several kinds of data transfers such as inter-node communication and file access. These data transfers require one host to simultaneously play two or more roles […]
Feb, 1
In-Memory Data Analytics on Coupled CPU-GPU Architectures
In the big data era, in-memory data analytics is an effective means of achieving high performance data processing and realizing the value of data in a timely manner. Efforts in this direction have been spent on various aspects, including in-memory algorithmic designs and system optimizations. In this paper, we propose to develop the next-generation in-memory […]
Feb, 1
Mascar: Speeding up GPU Warps by Reducing Memory Pitstops
With the prevalence of GPUs as throughput engines for data parallel workloads, the landscape of GPU computing is changing significantly. Non-graphics workloads with high memory intensity and irregular access patterns are frequently targeted for acceleration on GPUs. While GPUs provide large numbers of compute resources, the resources needed for memory intensive workloads are more scarce. […]