6424

Posts

Nov, 22

Experiences with Achieving Portability across Heterogeneous Architectures

The increasing computational needs of parallel applications inevitably require portability across popular parallel architectures, which are becoming heterogeneous. The lack of a common parallel framework results in divergent code bases, difficulty in porting, higher maintenance cost, and, thus difficulty achieving optimal performance on target architectures. Our paper examines two representative parallel applications and describes code […]
Nov, 22

Superconducting proximity effect in graphene under inhomogeneous strain

The interplay between quantum Hall states and Cooper pairs is usually hindered by the suppression of the superconducting state due to the strong magnetic fields needed to observe the quantum Hall effect. From this point of view graphene is special since it allows the creation of strong pseudo-magnetic fields due to strain. We show that […]
Nov, 21

Online Adaptive Code Generation and Tuning

In this paper, we present a runtime compilation and tuning framework for parallel programs. We extend our prior work on our auto-tuner, Active Harmony, for tunable parameters that require code generation (for example, different unroll factors). For such parameters, our auto-tuner generates and compiles new code on-the-fly. Effectively, we merge traditional feedback directed optimization and […]
Nov, 21

Issues in Heterogenenous GPU Clusters

In this paper, we discuss networking issues arising in the design, analysis and use for scientific computing of clusters equipped with graphics processing units. The adoption of graphics accelerators in clusters used for high-performance scientific computing is a fairly recent phenomenon and promises to be an important trend now and into the foreseeable future. After […]
Nov, 21

A new approach for sparse matrix vector product on NVIDIA GPUs

The sparse matrix vector product (SpMV) is a key operation in engineering and scientific computing and, hence, it has been subjected to intense research for a long time. The irregular computations involved in SpMV make its optimization challenging. Therefore, enormous effort has been devoted to devise data formats to store the sparse matrix with the […]
Nov, 21

GPU-Based Image Processing Use Cases: A High-Level Approach

This paper addresses the gap between envisioned hardware-virtualized techniques for GPU programming and a conventional approach from the point of view of an application engineer taking software engineering aspects like maintainability, understandability and productivity, and resulting achieved gain in performance and scalability into account. This gap is discussed on the basis of use cases from […]
Nov, 21

CT image reconstruction with half precision floating-point values

PURPOSE: Analytic CT image reconstruction is a computationally demanding task. Currently, the even more demanding iterative reconstruction algorithms find their way into clinical routine because their image quality is superior to analytic image reconstruction. The authors thoroughly analyze a so far unconsidered but valuable tool of tomorrow’s reconstruction hardware (CPU and GPU) that allows implementing […]
Nov, 21

Efficient GPGPU-based parallel packet classification

With the rapid growth of network technologies, many new web services have been developed to provide various applications and computing functions. These services rely deeply on the internet. Therefore, packet classification is an important issue of network security that typically adopts a flexible packet filtering system to classify each processed packet. Traditional packet classification requires […]
Nov, 21

Conflux: Embedding Massively Parallel Semantics in a High-Level Programming Language

As of late massively parallel devices have become mainstream and are widely used in research and industry. But even despite recent advances of the API, programming these devices has proven to be a difficult and error-prone task. We have designed Conflux, an embedded domain-specific language that integrates massively parallel semantics into a high-level programming language. […]
Nov, 21

Graph-based Parallel Analysis of Large Analog Circuits Based on GPU Platforms

In this paper, we propose a new parallel analysis method for large analog circuits using determinant decision diagram (DDD) based graph technique. DDD-based symbolic analysis technique enables exact symbolic analysis of vary large analog circuits. Once the circuit small-signal characteristics are presented by DDDs, evaluation of DDDs will give exact numerical values. In this paper, […]
Nov, 21

Challenge benchmarks that must be conquered to sustain the gpu revolution

The shift from GPUs to GPGPUs has brought with it many changes to the GPU architecture (e.g. more caches, more concurrent kernels, better synchronization). As GPUs press further into the general-purpose domain, architects must continue to address the performance of challenging workloads. This paper presents a set of challenge benchmarks and their key performance limitations […]
Nov, 21

PATUS: A Code Generation and Autotuning Framework For Parallel Iterative Stencil Computations on Modern Microarchitectures

Stencil calculations comprise an important class of kernels in many scientific computing applications ranging from simple PDE solvers to constituent kernels in multigrid methods as well as image processing applications. In such types of solvers, stencil kernels are often the dominant part of the computation, and an efficient parallel implementation of the kernel is therefore […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: