13472

Posts

Feb, 6

Unlocking Bandwidth for GPUs in CC-NUMA Systems

Historically, GPU-based HPC applications have had a substantial memory bandwidth advantage over CPU-based workloads due to using GDDR rather than DDR memory. However, past GPUs required a restricted programming model where application data was allocated up front and explicitly copied into GPU memory before launching a GPU kernel by the programmer. Recently, GPUs have eased […]
Feb, 6

Nucleation Studies on Graphics Processing Units

A system in a metastable state needs to overcome a certain free energy barrier to form a droplet of the stable phase. Standard treatments assume spherical droplets, but this is not appropriate in the presence of an anisotropy, such as for crystals. The anisotropy of the system has a strong effect on their surface free […]
Feb, 6

Cryptography on Graphics Processing Unit: A Survey

The profession of shelter advertisement by transfigure it into an unreadable arrange name decipher text, only those who possess a recondite keyboard can read the express into bewail text is Cryptography. Graphics Processing Units (GPUs) have come increasingly epidemic over the last forever as a side-forcible import of further variegated computationally intensifying tasks. The practicability […]
Feb, 6

Comparison of OpenCL performance on different platforms using VexCL and Blaze

This technical report provides performance numbers for several benchmark problems running on several different hardware platforms. The goal of this report is twofold. First, it helps us better understand how the performance of OpenCL changes on different platforms. Second, it provides a OpenCL-OpenMP comparison for a sparse matrix-vector multiplication operation. The VexCL library will be […]
Feb, 6

Register-leaning kernels in CUDA

Kepler cards offer a giant amount of register space. One can use this memory to store working data arrays, just as one uses the shared memory. This white paper will describe such register-leaning approach in detail.
Feb, 3

A Survey of Power Management Techniques for Phase Change Memory

The demands of larger memory capacity in high-performance computing systems have motivated the researchers to explore alternatives of DRAM (dynamic random access memory). Since PCM (phase change memory) provides high-density, good scalability and non-volatile data storage, it has received significant amount of attention in recent years. A crucial bottleneck in wide-spread adoption of PCM, however, […]
Feb, 3

Pointer Analysis for Semi-Automatic Code Parallelizers

Code parallelizers are employed these days to reduce the efforts needed in manually parallelizing sequential code. But they are ineffective when it comes to handling programming constructs like pointers. Code parallelizers like Par4all have a limited support for pointers while approaches like the ASET + BONES cannot handle pointers at all. In this thesis we […]
Feb, 3

Exploiting Concurrency Patterns with Heterogeneous Task and Data Parallelism

Parallel programming of an application requires not only domain knowledge of the application, but also programming environment support and in-depth awareness of the target architecture. Often, all concurrency features of the architecture are not exposed to the programming environment. The challenge lies in efficient utilization of these unexposed features to write effective parallel programs. In […]
Feb, 3

GPGPU and MIC in Accelerated Cluster for Remote Sensed Image Processing Software

Processing of Earth observation remotely sensed images requires more and more powerful computing facilities. Since a few years, GPGPU (General Purpose processing on Graphics Processing Units) technology has been used to perform massively parallel calculations. The French Space Agency (CNES) has then made a portage of some IAS to assess their performance using this type […]
Feb, 3

On the Accelerating of Two-dimensional Smart Laplacian Smoothing on the GPU

This paper presents a GPU-accelerated implementation of two-dimensional Smart Laplacian smoothing. This implementation is developed under the guideline of our paradigm for accelerating Laplacianbased mesh smoothing [13]. Two types of commonly used data layouts, Array-of-Structures (AoS) and Structure-of-Arrays (SoA) are used to represent triangular meshes in our implementation. Two iteration forms that have different choices […]
Feb, 3

Scaling Recurrent Neural Network Language Models

This paper investigates the scaling properties of Recurrent Neural Network Language Models (RNNLMs). We discuss how to train very large RNNs on GPUs and address the questions of how RNNLMs scale with respect to model size, training-set size, computational costs and memory. Our analysis shows that despite being more costly to train, RNNLMs obtain much […]
Feb, 2

Multi-GPU Support on Shared Memory System using Directive-based Programming Model

Existing and emerging studies show that using single Graphics Processing Units (GPUs) can lead to obtaining significant performance gains. These devices have tremendous processing capabilities. We should be able to achieve further orders of performance speedup if we use more than just one GPU. Heterogeneous processors consisting of multiple CPUs and GPUs offer immense potential […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: