5359

Posts

Aug, 25

Fine-grain Parallelism using Multi-core, Cell/BE, and GPU Systems

Currently, we are facing a situation where applications exhibit increasing computational demands and where a large variety of parallel processor systems are available. In this paper we focus on exploiting fine-grain parallelism for three applications with distinct characteristics: a Bioinformatics application (MrBayes), a Molecular Dynamics application (NAMD), and a Database application (TPC-H). We assess, side-by-side, […]
Aug, 25

An efficient GPU-based time domain solver for the acoustic wave equation

An efficient algorithm for time-domain solution of the acoustic wave equation for the purpose of room acoustics is presented. It is based on adaptive rectangular decomposition of the scene and uses analytical solutions within the partitions that rely on spatially invariant speed of sound. This technique is suitable for auralizations and sound field visualizations, even […]
Aug, 25

GPU-acceleration for Moving Particle Semi-implicit Method

The MPS (Moving Particle Semi-implicit) method has been proven useful in computation free-surface hydrodynamic flows. Despite its applicability, one of its drawbacks in practical application is the high computational load. On the other hand, Graphics Processing Unit (GPU), which was originally developed for acceleration of computer graphics, now provides unprecedented capability for scientific computations. The […]
Aug, 24

Parallel computation of spherical parameterizations for mesh analysis

Mesh parameterization is central to a broad spectrum of applications. In this paper, we present a novel approach to spherical mesh parameterization based on an iterative quadratic solver that is efficiently parallelizable on modern massively parallel architectures. We present an extensive analysis of performance results on both GPU and multicore architectures. We introduce a number […]
Aug, 24

A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction

Programmers for GPGPU face rapidly changing substrate of programming abstractions, execution models, and hardware implementations. It has been established, through numerous demonstrations for particular conjunctions of application kernel, programming languages, and GPU hardware instance, that it is possible to achieve significant improvements in the price/performance and energy/performance over general purpose processors. But these demonstrations are […]
Aug, 24

CnC-CUDA: declarative programming for GPUs

The computer industry is at a major inflection point in its hardware roadmap due to the end of a decades-long trend of exponentially increasing clock frequencies. Instead, future computer systems are expected to be built using homogeneous and heterogeneous many-core processors with 10’s to 100’s of cores per chip, and complex hardware designs to address […]
Aug, 24

WAYPOINT: scaling coherence to thousand-core architectures

In this paper, we evaluate a set of coherence architectures in the context of a 1024-core chip multiprocessor (CMP) tailored to throughput-oriented parallel workloads. Based on our analysis, we develop and evaluate two techniques for scaling coherence to thousand-core CMPs. We find that a broadcast-based probe filtering scheme provides reasonable performance up to 128 cores […]
Aug, 24

Implementation of a programming environment with a multithread model for reconfigurable systems

Reconfigurable systems are known to be able to achieve higher performance than traditional microprocessor architecture for many application fields. However, in order to extract a full potential of the reconfigurable systems, programmers often have to design and describe the best suited code for their target architecture with specialized knowledge. The aim of this paper is […]
Aug, 24

Experience of parallelizing cryo-EM 3D reconstruction on a CPU-GPU heterogeneous system

Heterogeneous architecture is becoming an important way to build a massive parallel computer system, i.e. the CPU-GPU heterogeneous systems ranked in Top500 list. However, it is a challenge to efficiently utilize massive parallelism of both applications and architectures on such heterogeneous systems. In this paper we present a practice on how to exploit and orchestrate […]
Aug, 24

An open framework for rapid prototyping of signal processing applications

Embedded real-time applications in communication systems have significant timing constraints, thus requiring multiple computation units. Manually exploring the potential parallelism of an application deployed on multicore architectures is greatly time-consuming. This paper presents an open-source Eclipse-based framework which aims to facilitate the exploration and development processes in this context. The framework includes a generic graph […]
Aug, 24

SCF: a device- and language-independent task coordination framework for reconfigurable, heterogeneous systems

Heterogeneous computing systems comprised of accelerators such as FPGAs, GPUs, and Cell processors coupled with standard microprocessors are becoming an increasingly popular solution to building future computing systems. Although programming languages and tools have evolved to simplify device-level design, programming such systems is still difficult and time-consuming due to system-level challenges involving synchronization and communication […]
Aug, 24

Precise dynamic analysis for slack elasticity: adding buffering without adding bugs

Increasing the amount of buffering for MPI sends is an effective way to improve the performance of MPI programs. However, for programs containing non-deterministic operations, this can result in new deadlocks or other safety assertion violations. Previous work did not provide any characterization of the space of slack elastic programs: those for which buffering can […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: