Posts
Aug, 25
GPU-acceleration for Moving Particle Semi-implicit Method
The MPS (Moving Particle Semi-implicit) method has been proven useful in computation free-surface hydrodynamic flows. Despite its applicability, one of its drawbacks in practical application is the high computational load. On the other hand, Graphics Processing Unit (GPU), which was originally developed for acceleration of computer graphics, now provides unprecedented capability for scientific computations. The […]
Aug, 24
Parallel computation of spherical parameterizations for mesh analysis
Mesh parameterization is central to a broad spectrum of applications. In this paper, we present a novel approach to spherical mesh parameterization based on an iterative quadratic solver that is efficiently parallelizable on modern massively parallel architectures. We present an extensive analysis of performance results on both GPU and multicore architectures. We introduce a number […]
Aug, 24
A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction
Programmers for GPGPU face rapidly changing substrate of programming abstractions, execution models, and hardware implementations. It has been established, through numerous demonstrations for particular conjunctions of application kernel, programming languages, and GPU hardware instance, that it is possible to achieve significant improvements in the price/performance and energy/performance over general purpose processors. But these demonstrations are […]
Aug, 24
CnC-CUDA: declarative programming for GPUs
The computer industry is at a major inflection point in its hardware roadmap due to the end of a decades-long trend of exponentially increasing clock frequencies. Instead, future computer systems are expected to be built using homogeneous and heterogeneous many-core processors with 10’s to 100’s of cores per chip, and complex hardware designs to address […]
Aug, 24
WAYPOINT: scaling coherence to thousand-core architectures
In this paper, we evaluate a set of coherence architectures in the context of a 1024-core chip multiprocessor (CMP) tailored to throughput-oriented parallel workloads. Based on our analysis, we develop and evaluate two techniques for scaling coherence to thousand-core CMPs. We find that a broadcast-based probe filtering scheme provides reasonable performance up to 128 cores […]
Aug, 24
Implementation of a programming environment with a multithread model for reconfigurable systems
Reconfigurable systems are known to be able to achieve higher performance than traditional microprocessor architecture for many application fields. However, in order to extract a full potential of the reconfigurable systems, programmers often have to design and describe the best suited code for their target architecture with specialized knowledge. The aim of this paper is […]
Aug, 24
Experience of parallelizing cryo-EM 3D reconstruction on a CPU-GPU heterogeneous system
Heterogeneous architecture is becoming an important way to build a massive parallel computer system, i.e. the CPU-GPU heterogeneous systems ranked in Top500 list. However, it is a challenge to efficiently utilize massive parallelism of both applications and architectures on such heterogeneous systems. In this paper we present a practice on how to exploit and orchestrate […]
Aug, 24
An open framework for rapid prototyping of signal processing applications
Embedded real-time applications in communication systems have significant timing constraints, thus requiring multiple computation units. Manually exploring the potential parallelism of an application deployed on multicore architectures is greatly time-consuming. This paper presents an open-source Eclipse-based framework which aims to facilitate the exploration and development processes in this context. The framework includes a generic graph […]
Aug, 24
SCF: a device- and language-independent task coordination framework for reconfigurable, heterogeneous systems
Heterogeneous computing systems comprised of accelerators such as FPGAs, GPUs, and Cell processors coupled with standard microprocessors are becoming an increasingly popular solution to building future computing systems. Although programming languages and tools have evolved to simplify device-level design, programming such systems is still difficult and time-consuming due to system-level challenges involving synchronization and communication […]
Aug, 24
Precise dynamic analysis for slack elasticity: adding buffering without adding bugs
Increasing the amount of buffering for MPI sends is an effective way to improve the performance of MPI programs. However, for programs containing non-deterministic operations, this can result in new deadlocks or other safety assertion violations. Previous work did not provide any characterization of the space of slack elastic programs: those for which buffering can […]
Aug, 24
Elastic pipeline: addressing GPU on-chip shared memory bank conflicts
One of the major problems with the GPU on-chip shared memory is bank conflicts. We observed that the throughput of the GPU processor core is often constrained neither by the shared memory bandwidth, nor by the shared memory latency (as long as it stays constant), but is rather due to the varied latencies caused by […]
Aug, 23
JCudaMP: OpenMP/Java on CUDA
We present an OpenMP framework for Java that can exploit an available graphics card as an application accelerator. Dynamic languages (Java, C#, etc.) pose a challenge here because of their write-once-run-everywhere approach. This renders it impossible to make compile-time assumptions on whether and which type of accelerator or graphics card might be available in the […]