8091

Posts

Jun, 6

Relativistic Hydrodynamics on Graphic Cards

We show how to accelerate relativistic hydrodynamics simulations using graphic cards (graphic processing units, GPUs). These improvements are of highest relevance e.g. to the field of high-energetic nucleus-nucleus collisions at RHIC and LHC where (ideal and dissipative) relativistic hydrodynamics is used to calculate the evolution of hot and dense QCD matter. The results reported here […]
Jun, 4

Platform 2012, a Many-Core Computing Accelerator for Embedded SoCs: Performance Evaluation of Visual Analytics Applications

P2012 is an area- and power-efficient many-core computing accelerator based on multiple globally asynchronous, locally synchronous processor clusters. Each cluster features up to 16 processors with independent instruction streams sharing a multi-banked one-cycle access L1 data memory, a multi-channel DMA engine and specialized hardware for synchronization and aggressive power management. P2012 is 3D stacking ready […]
Jun, 1

MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-Based Systems

Data movement in high-performance computing systems accelerated by graphics processing units (GPUs) remains a challenging problem. Data communication in popular parallel programming models, such as the Message Passing Interface (MPI), is currently limited to the data stored in the CPU memory space. Auxiliary memory systems, such as GPU memory, are not integrated into such data […]
Jun, 1

Generating Device-specific GPU code for Local Operators in Medical Imaging

To cope with the complexity of programming GPU accelerators for medical imaging computations, we developed a framework to describe image processing kernels in a domainspecific language, which is embedded into C++. The description uses decoupled access/execute metadata, which allow the programmer to specify both execution constraints and memory access patterns of kernels. A source-to-source compiler […]
May, 24

Compiler optimizations for directive-based programming for accelerators

Parallel programming is difficult. For regular computation on central processing units application programming interfaces such as OpenMP, which augment normal sequential programs with preprocessor directives to achieve parallelism, have proven to be easy for programmers and they provide good multithreaded performance. OpenACC is a fork of the OpenMP project, which aims to provide a similar […]
May, 24

Fine-Grained Resource Sharing for Concurrent GPGPU Kernels

General purpose GPU (GPGPU) programming frameworks such as OpenCL and CUDA allow running individual computation kernels sequentially on a device. However, in some cases it is possible to utilize device resources more efficiently by running kernels concurrently. This raises questions about load balancing and resource allocation that have not previously warranted investigation. For example, what […]
May, 19

C-DAC’s Efforts – Application Kernels on HPC Cluster with GPU Accelerators

We describe the problem of parallelization of finite difference method (FDM) and finite element method (FEM) computations for certain class of partial differential equations (PDEs) on High Performance Computing (HPC) GPU cluster. For FDM, the structured grids have been employed and optimal data rearrangement operations are performed in GPU computations. For FEM, unstructured triangular and […]
May, 12

Characterization and Transformation of Unstructured Control Flow in Bulk Synchronous GPU Applications

In this paper we identify important classes of program control flows in applications targeted to commercially available graphics processing units (GPUs) and characterize their presence in real workloads such as those that occur in CUDA and OpenCL. Broadly, control flow can be characterized as structured or unstructured. It is shown that most existing techniques for […]
May, 9

Enabling task-level scheduling on heterogeneous platforms

OpenCL is an industry standard for parallel programming on heterogeneous devices. With OpenCL, compute-intensive portions of an application can be offloaded to a variety of processing units within a system. OpenCL is the first standard that focuses on portability, allowing programs to be written once and run seamlessly on multiple, heterogeneous devices, regardless of vendor. […]
May, 4

Generalizing the Utility of GPUs in Large-Scale Heterogeneous Computing Systems

Graphics processing units (GPUs) have been widely used as accelerators in large-scale heterogeneous computing systems. However, current programming models can only support the utilization of local GPUs. When using non-local GPUs, programmers need to explicitly call API functions for data communication across computing nodes. As such, programming GPUs in large-scale computing systems is more challenging […]
May, 2

GPU Acceleration for the C++ Standard Template Library

Modern programmers must exploit parallelism for performance gains, possibly through the use of an attached or on-chip GPU. To take advantage of the GPU in C++ programs, the programmer must use either a new language (CUDA or OpenCL) or an external library (Thrust). Rather than requiring that programmers learn new tools, modify existing code, and […]
Apr, 25

Radio Astronomy Beam Forming on Many-Core Architectures

Traditional radio telescopes use large steel dishes to observe radio sources. The largest radio telescope in the world, LOFAR, uses tens of thousands of fixed, omnidirectional antennas instead, a novel design that promises ground-breaking research in astronomy. Where traditional telescopes use custom-built hardware, LOFAR uses software to do signal processing in real time. This leads […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: