Posts
Nov, 23
Scalable Multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer
For scalable 3-D FFT computation using multiple GPUs, efficient all-to-all communication between GPUs is the most important factor in good performance. Implementations with point-to-point MPI library functions and CUDA memory copy APIs typically exhibit very large overheads especially for small message sizes in all-to-all communications between many nodes. We propose several schemes to minimize the […]
Nov, 23
Efficient reconstruction of biological networks via transitive reduction on general purpose graphics processors
BACKGROUND: Techniques for reconstruction of biological networks which are based on perturbation experimentsoften predict direct interactions between nodes that do not exist. Transitive reduction removes suchrelations if they can be explained by an indirect path of in influences. The existing algorithms fortransitive reduction are sequential and might suffer from too long run times for large […]
Nov, 22
GPU Implementation of Fuzzy Anisotropic Diffusion
In this paper, we present a GPU-based implementation of the Fuzzy-Anisotropic diffusion technique oriented for high-resolution multidimensional image/video techniques. The aggregation of parallel computing and the HW/SW co-design techniques are used in order to improve the time performance of the Fuzzy-Anisotropic Diffusion algorithm for image/video applications. Experimental results show the significantly increased performance efficiency both […]
Nov, 22
CoreTSAR: Task Scheduling for Accelerator-aware Runtimes
Heterogeneous supercomputers that incorporate computational accelerators such as GPUs are increasingly popular due to their high peak performance, energy efficiency and comparatively low cost. Unfortunately, the programming models and frameworks designed to extract performance from all computational units still lack the flexibility of their CPU-only counterparts. Accelerated OpenMP improves this situation by supporting natural migration […]
Nov, 22
Automatic generation of software pipelines for heterogeneous parallel systems
Pipelining is a well-known approach to increasing parallelism and performance. We address the problem of software pipelining for heterogeneous parallel platforms that consist of different multi-core and many-core processing units. In this context, pipelining involves two key steps—partitioning an application into stages and mapping and scheduling the stages onto the processing units of the heterogeneous […]
Nov, 22
Tera-scale Astronomical Data Analysis and Visualization
We present a high-performance, graphics processing unit (GPU)-based framework for the efficient analysis and visualization of (nearly) terabyte (TB)-sized 3-dimensional images. Using a cluster of 96 GPUs, we demonstrate for a 0.5 TB image: (1) volume rendering using an arbitrary transfer function at 7–10 frames per second; (2) computation of basic global image statistics such […]
Nov, 22
List Mode PET reconstruction
PET technology has an important role in modern medical diagnostics. With this process we can view snapshot of a given part of the body’s metabolism which provides more information than examining the organ’s anatomy. Processing list mode measurement data is a demanding task, to solve this problem we use GPU which provides the necessary parallel […]
Nov, 21
Early evaluation of directive-based GPU programming models for productive exascale computing
Graphics Processing Unit (GPU)-based parallel computer architectures have shown increased popularity as a building block for high performance computing, and possibly for future Exascale computing. However, their programming complexity remains as a major hurdle for their widespread adoption. To provide better abstractions for programming GPU architectures, researchers and vendors have proposed several directive-based GPU programming […]
Nov, 21
High-Performance General Solver for Extremely Large-Scale Semidefinite Programming Problems
Semidefinite programming (SDP) is one of the most important problems among optimization problems at present. It is relevant to a wide range of fields such as combinatorial optimization, structural optimization, control theory, economics, quantum chemistry, sensor network location and data mining. The capability to solve extremely large-scale SDP problems will have a significant effect on […]
Nov, 21
ValuePack: value-based scheduling framework for CPU-GPU clusters
Heterogeneous computing nodes are becoming commonplace today, and recent trends strongly indicate that clusters, supercomputers, and cloud environments will increasingly host more heterogeneous resources, with some being massively parallel (e.g., GPU). With such heterogeneous environments becoming common, it is important to revisit scheduling problems for clusters and cloud environments. In this paper, we formulate and […]
Nov, 21
An Adaptive Multiresolution Mesh Representation for CPU-GPU Coupled Computation
In this paper, we present an adaptive multiresolution mesh representation exploring the computational differences of the CPU and the GPU. We build our representation considering a dense-polygon mesh simplified to a base mesh which stores the original geometry by means of an atlas structure. For both simplification and refinement processes, we present a hierarchical method […]
Nov, 21
Pattern Recognition with OpenCL Heterogeneous Platform
OpenCL platform provides unified development environment for various multicore processors. In this paper, we evaluate the OpenCL framework for application in pattern recognition. We have selected the most common algorithm for Artificial Neural Networks (ANN) training – the backpropagation algorithm for parallelization with OpenCL because of its high demand for processing resources. We will show […]