14544

Posts

Sep, 9

Experimentation Procedure for Offloaded Mini-Apps Executed on Cluster Architectures with Xeon Phi Accelerators

A heterogeneous cluster architecture is complex. It contains hundreds, or thousands of devices connected by a tiered communication system in order to solve a problem. As a heterogeneous system, these devices will have varying performance capabilities. To better understand the interactions which occur between the various devices during execution, an experimentation procedure has been devised […]
Sep, 9

Parallel waveform extraction algorithms for the Cherenkov Telescope Array Real-Time Analysis

The Cherenkov Telescope Array (CTA) is the next generation observatory for the study of very high-energy gamma rays from about 20 GeV up to 300 TeV. Thanks to the large effective area and field of view, the CTA observatory will be characterized by an unprecedented sensitivity to transient flaring gamma-ray phenomena compared to both current […]
Sep, 9

A Performance Comparison of Algebraic Multigrid Preconditioners on CPUs, GPUs, and Xeon Phis

Algebraic multigrid preconditioners for accelerating iterative solvers are a popular choice for a broad range of applications, because they are able to obtain asymptotic optimality, yet can be applied in a black-box manner. However, only a few variants of algebraic multigrid preconditioners can fully benefit from finegrained parallelization available on multi- and many-core architectures. Previous […]
Sep, 9

Dissecting GPU Memory Hierarchy through Microbenchmarking

Memory access efficiency is a key factor in fully utilizing the computational power of graphics processing units (GPUs). However, many details of the GPU memory hierarchy are not released by GPU vendors. In this paper, we propose a novel fine-grained microbenchmarking approach and apply it to three generations of NVIDIA GPUs, namely Fermi, Kepler and […]
Sep, 8

Accelerating Multiple Compound Comparison Using LINGO-based Load-Balancing Strategies on Multi-GPUs

Compound comparison is an important task for the computational chemistry. By the comparison results, potential inhibitors can be found and then used for the pharmacy experiments. The time complexity of a pairwise compound comparison is O(n^2), where n is the maximal length of compounds. In general, the length of compounds is tens to hundreds, and […]
Sep, 8

Assessing the hardness of SVP algorithms in the presence of CPUs and GPUs

Lattice-based cryptography has been a hot topic in the past decade, since it is believed that lattice-based cryptosystems are immune against attacks operated by quantum computers. The security of this type of cryptography is based on the hardness of algorithms that solve lattice-based problems, namely the Shortest Vector Problem (SVP). Therefore, it is important to […]
Sep, 8

Contributions to the Efficient Use of General Purpose Coprocessors: Kernel Density Estimation as Case Study

The high performance computing landscape is shifting from assemblies of homogeneous nodes towards heterogeneous systems, in which nodes consist of a combination of traditional out-oforder execution cores and accelerator devices. Accelerators, built around GPUs, many-core chips, or FPGAs, are used to offload compute-intensive tasks. These devices provide superior theoretical performance compared to traditional multi-core CPUs, […]
Sep, 8

Accelerating Web Search using GPUs

The amount of content on the Internet is growing rapidly as well as the number of the online Internet users. As a consequence, web search engines need to increase their computing capabilities and data continually while maintaining low search latency and without a significant rise in the cost per query. To serve this larger numbers […]
Sep, 7

A Survey Of Architectural Techniques for Near-Threshold Computing

Energy efficiency has now become the primary obstacle in scaling the performance of all classes of computing systems. Low-voltage computing and specifically, near-threshold voltage computing (NTC), which involves operating the transistor very close to and yet above its threshold voltage, holds the promise of providing many-fold improvement in energy efficiency. However, use of NTC also […]
Sep, 5

Waste Not, Want Not! Managing relational data in asymmetric memories

In this thesis, we study the management of relational data in modern, i.e., asymmetric computer systems. We explore different strategies to identify asymmetries in persistent data, map them to asymmetries in the memory landscape and, eventually, exploit them to increase query processing performance. To this end, we study memory conscious decomposition and storage of data […]
Sep, 5

Virtualizing Data Parallel Systems for Portability, Productivity, and Performance

Computer systems equipped with graphics processing units (GPUs) have become increasingly common over the last decade. In order to utilize the highly data parallel architecture of GPUs for general purpose applications, new programming models such as OpenCL and CUDA were introduced, showing that data parallel kernels on GPUs can achieve speedups by several orders of […]
Sep, 5

Parallel Execution of the ASP Computation – an Investigation on GPUs

This paper illustrates the design and implementation of a conflict-driven ASP solver that is capable of exploiting the Single-Instruction Multiple-Thread parallelism offered by General Purpose Graphical Processing Units (GPUs). Modern GPUs are multi-core platforms, providing access to large number of cores at a very low cost, but at the price of a complex architecture with […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: