high performance computing on graphics processing units: hgpu.org

Posts

Nov, 22

CoreTSAR: Task Scheduling for Accelerator-aware Runtimes

Heterogeneous supercomputers that incorporate computational accelerators such as GPUs are increasingly popular due to their high peak performance, energy efficiency and comparatively low cost. Unfortunately, the programming models and frameworks designed to extract performance from all computational units still lack the flexibility of their CPU-only counterparts. Accelerated OpenMP improves this situation by supporting natural migration […]

CUDA

Nov, 22

Automatic generation of software pipelines for heterogeneous parallel systems

Pipelining is a well-known approach to increasing parallelism and performance. We address the problem of software pipelining for heterogeneous parallel platforms that consist of different multi-core and many-core processing units. In this context, pipelining involves two key steps—partitioning an application into stages and mapping and scheduling the stages onto the processing units of the heterogeneous […]

CUDA

Nov, 22

Tera-scale Astronomical Data Analysis and Visualization

We present a high-performance, graphics processing unit (GPU)-based framework for the efficient analysis and visualization of (nearly) terabyte (TB)-sized 3-dimensional images. Using a cluster of 96 GPUs, we demonstrate for a 0.5 TB image: (1) volume rendering using an arbitrary transfer function at 7–10 frames per second; (2) computation of basic global image statistics such […]

CUDA

Nov, 22

List Mode PET reconstruction

PET technology has an important role in modern medical diagnostics. With this process we can view snapshot of a given part of the body’s metabolism which provides more information than examining the organ’s anatomy. Processing list mode measurement data is a demanding task, to solve this problem we use GPU which provides the necessary parallel […]

CUDA

Nov, 21

Early evaluation of directive-based GPU programming models for productive exascale computing

Graphics Processing Unit (GPU)-based parallel computer architectures have shown increased popularity as a building block for high performance computing, and possibly for future Exascale computing. However, their programming complexity remains as a major hurdle for their widespread adoption. To provide better abstractions for programming GPU architectures, researchers and vendors have proposed several directive-based GPU programming […]

CUDA

Nov, 21

High-Performance General Solver for Extremely Large-Scale Semidefinite Programming Problems

Semidefinite programming (SDP) is one of the most important problems among optimization problems at present. It is relevant to a wide range of fields such as combinatorial optimization, structural optimization, control theory, economics, quantum chemistry, sensor network location and data mining. The capability to solve extremely large-scale SDP problems will have a significant effect on […]

CUDA

Nov, 21

ValuePack: value-based scheduling framework for CPU-GPU clusters

Heterogeneous computing nodes are becoming commonplace today, and recent trends strongly indicate that clusters, supercomputers, and cloud environments will increasingly host more heterogeneous resources, with some being massively parallel (e.g., GPU). With such heterogeneous environments becoming common, it is important to revisit scheduling problems for clusters and cloud environments. In this paper, we formulate and […]

CUDA

Nov, 21

An Adaptive Multiresolution Mesh Representation for CPU-GPU Coupled Computation

In this paper, we present an adaptive multiresolution mesh representation exploring the computational differences of the CPU and the GPU. We build our representation considering a dense-polygon mesh simplified to a base mesh which stores the original geometry by means of an atlas structure. For both simplification and refinement processes, we present a hierarchical method […]

OpenGL

Nov, 21

Pattern Recognition with OpenCL Heterogeneous Platform

OpenCL platform provides unified development environment for various multicore processors. In this paper, we evaluate the OpenCL framework for application in pattern recognition. We have selected the most common algorithm for Artificial Neural Networks (ANN) training – the backpropagation algorithm for parallelization with OpenCL because of its high demand for processing resources. We will show […]

OpenCL

Nov, 20

Dataflow-driven GPU performance projection for multi-kernel transformations

Applications often have a sequence of parallel operations to be offloaded to graphics processors; each operation can become an individual GPU kernel. Developers typically explore a variety of transformations for each kernel. Furthermore, it is well known that efficient data management is critical in achieving high GPU performance and that "fusing" multiple kernels into one […]

CUDA

Nov, 20

Accelerating MapReduce on a coupled CPU-GPU architecture

The work presented here is driven by two observations. First, heterogeneous architectures that integrate a CPU and a GPU on the same chip are emerging, and hold much promise for supporting power-efficient and scalable high performance computing. Second, MapReduce has emerged as a suitable framework for simplified parallel application development for many classes of applications, […]

OpenCL

Nov, 20

A scalable, numerically stable, high-performance tridiagonal solver using GPUs

In this paper, we present a scalable, numerically stable, high-performance tridiagonal solver. The solver is based on the SPIKE algorithm for partitioning a large matrix into small independent matrices, which can be solved in parallel. For each small matrix, our solver applies a general 1-by-1 or 2-by-2 diagonal pivoting algorithm, which is also known to […]

CUDA