8271

Posts

Jul, 11

A fully parallel, high precision, N-body code running on hybrid computing platforms

We present a new implementation of the numerical integration of the classical, gravitational, N-body problem based on a high order Hermite’s integration scheme with block time steps, with a direct evaluation of the particle-particle forces. The main innovation of this code (called HiGPUs) is its full parallelization, exploiting both OpenMP and MPI in the use […]
Jul, 10

Runtime Systems and Scheduling Support for High-End CPU-GPU Architectures

In recent years, multi-core CPUs and many-core GPUs have emerged as mainstream and cost-effective means for scaling. Consequently, a trend that is receiving wide attention is of heterogeneous computing platforms consisting of both CPU and GPU. Such heterogeneous architectures are pervasive across notebooks, desktops, clusters, supercomputers and cloud environments. While they expose huge potential for […]
Jul, 3

Automatic Optimization of In-Flight Memory Transactions for GPU Accelerators based on a Domain-Specific Language for Medical Imaging

An efficient memory bandwidth utilization for GPU accelerators is crucial for memory bound applications. In medical imaging, the performance of many kernels is limited by the available memory bandwidth since only a few operations are performed per pixel. For such kernels only a fraction of the compute power provided by GPU accelerators can be exploited […]
Jul, 2

API-Compiling for Image Hardware Accelerators

We present an API-based compilation strategy to optimize image applications, developed using a high level image processing library, onto three different image processing hardware accelerators. The library API provides the semantics of the image computations. The three image accelerator targets are quite distinct: the first one uses a vector architecture; the second one presents a […]
Jun, 27

Software Performance Analysis with Parallel Programming Approaches

The term software performance engineering (SPE) is a systematic and quantitative approach for constructing software systems to meet the performance objectives such as response time, throughput, scalability and resource utilization. Optimization is major concern in achieving performance parameters. Optimization is performed during run-time, or in the design phase. This paper proposes the coding practices in […]
Jun, 26

Compiling a high-level language for GPUs: (via language support for architectures and compilers)

Languages such as OpenCL and CUDA offer a standard interface for general-purpose programming of GPUs. However, with these languages, programmers must explicitly manage numerous low-level details involving communication and synchronization. This burden makes programming GPUs difficult and error-prone, rendering these powerful devices inaccessible to most programmers. We desire a higher-level programming model that makes GPUs […]
Jun, 26

GPU-based Cloud Computing for Comparing the Structure of Protein Binding Sites

In this paper, we present a novel approach for using a GPU-based Cloud computing infrastructure to efficiently perform a structural comparison of protein binding sites. The original CPU-based Java version of a recent graph-based algorithm called SEGA has been rewritten in OpenCL to run on NVIDIA GPUs in parallel on a set of Amazon EC2 […]
Jun, 26

Evaluation of likelihood functions on CPU and GPU devices

We describe parallel implementations of an algorithm used to evaluate the likelihood function used in data analysis. The implementations run, respectively, on CPU and GPU, and both devices cooperatively (hybrid). CPU and GPU implementations are based on OpenMP and OpenCL, respectively. The hybrid implementation allows the application to run also on multi-GPU systems (not necessarily […]
Jun, 23

Hierarchical overlapped tiling

This paper introduces hierarchical overlapped tiling, a transformation that applies loop tiling and fusion to conventional loops. Overlapped tiling is a useful transformation to reduce communication overhead, but it may also generate a significant amount of redundant computation. Hierarchical overlapped tiling performs overlapped tiling hierarchically to balance communication overhead and redundant computation, and thus has […]
Jun, 23

Bacon: A GPU Programming System With Just in Time Specialization

This paper describes Bacon, a data-parallel programming system targeting OpenCL-compatible graphics processors. This system is built upon the existing OpenCL standard in order to make it easier for programmers to write high performance kernels for GPU accelerated applications. The OpenCL C syntax is extended into a new language, Bacon C, intended to make development significantly […]
Jun, 18

OpenACC – First Experiences with Real-World Applications

Today’s trend to use accelerators like GPGPUs in heterogeneous computer systems has entailed several low-level APIs for accelerator programming. However, programming these APIs is often tedious and therefore unproductive. To tackle this problem, recent approaches employ directive-based high-level programming for accelerators. In this work, we present our first experiences with OpenACC, an API consisting of […]
Jun, 13

Experiences with High-Level Programming Directives for Porting Applications to GPUs

HPC systems now exploit GPUs within their compute nodes to accelerate program performance. As a result, high-end application development has become extremely complex at the node level. In addition to restructuring the node code to exploit the cores and specialized devices, the programmer may need to choose a programming model such as OpenMP or CPU […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: