Posts
Nov, 22
CoreTSAR: Task Scheduling for Accelerator-aware Runtimes
Heterogeneous supercomputers that incorporate computational accelerators such as GPUs are increasingly popular due to their high peak performance, energy efficiency and comparatively low cost. Unfortunately, the programming models and frameworks designed to extract performance from all computational units still lack the flexibility of their CPU-only counterparts. Accelerated OpenMP improves this situation by supporting natural migration […]
Nov, 22
Automatic generation of software pipelines for heterogeneous parallel systems
Pipelining is a well-known approach to increasing parallelism and performance. We address the problem of software pipelining for heterogeneous parallel platforms that consist of different multi-core and many-core processing units. In this context, pipelining involves two key steps—partitioning an application into stages and mapping and scheduling the stages onto the processing units of the heterogeneous […]
Nov, 22
Tera-scale Astronomical Data Analysis and Visualization
We present a high-performance, graphics processing unit (GPU)-based framework for the efficient analysis and visualization of (nearly) terabyte (TB)-sized 3-dimensional images. Using a cluster of 96 GPUs, we demonstrate for a 0.5 TB image: (1) volume rendering using an arbitrary transfer function at 7–10 frames per second; (2) computation of basic global image statistics such […]
Nov, 22
List Mode PET reconstruction
PET technology has an important role in modern medical diagnostics. With this process we can view snapshot of a given part of the body’s metabolism which provides more information than examining the organ’s anatomy. Processing list mode measurement data is a demanding task, to solve this problem we use GPU which provides the necessary parallel […]
Nov, 21
Early evaluation of directive-based GPU programming models for productive exascale computing
Graphics Processing Unit (GPU)-based parallel computer architectures have shown increased popularity as a building block for high performance computing, and possibly for future Exascale computing. However, their programming complexity remains as a major hurdle for their widespread adoption. To provide better abstractions for programming GPU architectures, researchers and vendors have proposed several directive-based GPU programming […]
Nov, 21
High-Performance General Solver for Extremely Large-Scale Semidefinite Programming Problems
Semidefinite programming (SDP) is one of the most important problems among optimization problems at present. It is relevant to a wide range of fields such as combinatorial optimization, structural optimization, control theory, economics, quantum chemistry, sensor network location and data mining. The capability to solve extremely large-scale SDP problems will have a significant effect on […]
Nov, 21
ValuePack: value-based scheduling framework for CPU-GPU clusters
Heterogeneous computing nodes are becoming commonplace today, and recent trends strongly indicate that clusters, supercomputers, and cloud environments will increasingly host more heterogeneous resources, with some being massively parallel (e.g., GPU). With such heterogeneous environments becoming common, it is important to revisit scheduling problems for clusters and cloud environments. In this paper, we formulate and […]
Nov, 21
An Adaptive Multiresolution Mesh Representation for CPU-GPU Coupled Computation
In this paper, we present an adaptive multiresolution mesh representation exploring the computational differences of the CPU and the GPU. We build our representation considering a dense-polygon mesh simplified to a base mesh which stores the original geometry by means of an atlas structure. For both simplification and refinement processes, we present a hierarchical method […]
Nov, 21
Pattern Recognition with OpenCL Heterogeneous Platform
OpenCL platform provides unified development environment for various multicore processors. In this paper, we evaluate the OpenCL framework for application in pattern recognition. We have selected the most common algorithm for Artificial Neural Networks (ANN) training – the backpropagation algorithm for parallelization with OpenCL because of its high demand for processing resources. We will show […]
Nov, 20
Dataflow-driven GPU performance projection for multi-kernel transformations
Applications often have a sequence of parallel operations to be offloaded to graphics processors; each operation can become an individual GPU kernel. Developers typically explore a variety of transformations for each kernel. Furthermore, it is well known that efficient data management is critical in achieving high GPU performance and that "fusing" multiple kernels into one […]
Nov, 20
Accelerating MapReduce on a coupled CPU-GPU architecture
The work presented here is driven by two observations. First, heterogeneous architectures that integrate a CPU and a GPU on the same chip are emerging, and hold much promise for supporting power-efficient and scalable high performance computing. Second, MapReduce has emerged as a suitable framework for simplified parallel application development for many classes of applications, […]
Nov, 20
A scalable, numerically stable, high-performance tridiagonal solver using GPUs
In this paper, we present a scalable, numerically stable, high-performance tridiagonal solver. The solver is based on the SPIKE algorithm for partitioning a large matrix into small independent matrices, which can be solved in parallel. For each small matrix, our solver applies a general 1-by-1 or 2-by-2 diagonal pivoting algorithm, which is also known to […]