13485

Posts

Feb, 9

Sparse Matrix-Vector Multiplication on GPU

Sparse Matrix-Vector multiplication (SpMV) is one of the key operations in linear algebra. Overcoming thread divergence, load imbalance and un-coalesced and indirect memory access due to sparsity and irregularity are challenges to optimizing SpMV on GPUs. This dissertation develops solutions that address these challenges effectively. The first part of this dissertation focuses on a new […]
Feb, 9

Fine-Tuning Vectorization and Memory Traffic on Intel Xeon Phi Coprocessors: LU Decomposition of Small Matrices

Common techniques for fine-tuning the performance of automatically vectorized loops in applications for Intel Xeon Phi coprocessors are discussed. These techniques include strength reduction, regularizing the vectorization pattern, data alignment and aligned data hint, and pointer disambiguation. In addition, the loop tiling technique of memory traffic tuning is shown. The optimization methods are illustrated on […]
Feb, 8

Power Management Techniques for Data Centers: A Survey

With growing use of internet and exponential growth in amount of data to be stored and processed (known as ‘big data’), the size of data centers has greatly increased. This, however, has resulted in significant increase in the power consumption of the data centers. For this reason, managing power consumption of data centers has become […]
Feb, 6

Extending the Gotran framework: LATEX and GPU acceleration

Gotran provides a framework for working with systems of ordinary differential equations (ODEs): Its primary goal is to increase the workflow efficiency of computational modelling in biomedical research. The ODEs, given by the time derivative of state variables, are described in a Gotran form file and can be automatically translated into different outputs depending on […]
Feb, 6

Unlocking Bandwidth for GPUs in CC-NUMA Systems

Historically, GPU-based HPC applications have had a substantial memory bandwidth advantage over CPU-based workloads due to using GDDR rather than DDR memory. However, past GPUs required a restricted programming model where application data was allocated up front and explicitly copied into GPU memory before launching a GPU kernel by the programmer. Recently, GPUs have eased […]
Feb, 6

Nucleation Studies on Graphics Processing Units

A system in a metastable state needs to overcome a certain free energy barrier to form a droplet of the stable phase. Standard treatments assume spherical droplets, but this is not appropriate in the presence of an anisotropy, such as for crystals. The anisotropy of the system has a strong effect on their surface free […]
Feb, 6

Cryptography on Graphics Processing Unit: A Survey

The profession of shelter advertisement by transfigure it into an unreadable arrange name decipher text, only those who possess a recondite keyboard can read the express into bewail text is Cryptography. Graphics Processing Units (GPUs) have come increasingly epidemic over the last forever as a side-forcible import of further variegated computationally intensifying tasks. The practicability […]
Feb, 6

Comparison of OpenCL performance on different platforms using VexCL and Blaze

This technical report provides performance numbers for several benchmark problems running on several different hardware platforms. The goal of this report is twofold. First, it helps us better understand how the performance of OpenCL changes on different platforms. Second, it provides a OpenCL-OpenMP comparison for a sparse matrix-vector multiplication operation. The VexCL library will be […]
Feb, 6

Register-leaning kernels in CUDA

Kepler cards offer a giant amount of register space. One can use this memory to store working data arrays, just as one uses the shared memory. This white paper will describe such register-leaning approach in detail.
Feb, 3

A Survey of Power Management Techniques for Phase Change Memory

The demands of larger memory capacity in high-performance computing systems have motivated the researchers to explore alternatives of DRAM (dynamic random access memory). Since PCM (phase change memory) provides high-density, good scalability and non-volatile data storage, it has received significant amount of attention in recent years. A crucial bottleneck in wide-spread adoption of PCM, however, […]
Feb, 3

Pointer Analysis for Semi-Automatic Code Parallelizers

Code parallelizers are employed these days to reduce the efforts needed in manually parallelizing sequential code. But they are ineffective when it comes to handling programming constructs like pointers. Code parallelizers like Par4all have a limited support for pointers while approaches like the ASET + BONES cannot handle pointers at all. In this thesis we […]
Feb, 3

Exploiting Concurrency Patterns with Heterogeneous Task and Data Parallelism

Parallel programming of an application requires not only domain knowledge of the application, but also programming environment support and in-depth awareness of the target architecture. Often, all concurrency features of the architecture are not exposed to the programming environment. The challenge lies in efficient utilization of these unexposed features to write effective parallel programs. In […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org