12417

Posts

Jul, 1

Parallelizing the cellular potts model on GPU and multi-core CPU: An OpenCL cross-platform study

In this paper, we present the analysis and development of a cross-platform OpenCL parallelization of the Cellular Potts Model (CPM). In general, the evolution of the CPM is time-consuming. Using data-parallel programming model such as CUDA can accelerate the process, but it is highly dependent on the hardware type and manufacturer. Recently, OpenCL has attracted […]
Jul, 1

High-Level Programming Framework for Executing Streaming Applications on Heterogeneous OpenCL Platforms

As the computer industry is reaching more and more limits regarding processor speed and transistor size, they have to come up with complex new architectures and more efficient use of the available processing power. For application developers this can be a difficult task, because they have to be aware of low-level hardware properties and there […]
Jun, 26

On the Characterization of OpenCL Dwarfs on Fixed and Reconfigurable Platforms

The proliferation of heterogeneous computing platforms presents the parallel computing community with new challenges. One such challenge entails evaluating the efficacy of such parallel architectures and identifying the architectural innovations that ultimately benefit applications. To address this challenge, we need benchmarks that capture the execution patterns (i.e., dwarfs or motifs) of applications, both present and […]
Jun, 17

A Portable OpenCL Lattice Boltzmann Code for Multi- and Many-core Processor Architectures

The architecture of high performance computing systems is becoming more and more heterogeneous, as accelerators play an increasingly important role alongside traditional CPUs. Programming heterogeneous systems efficiently is a complex task, that often requires the use of specific programming environments. Programming frameworks supporting codes portable across different high performance architectures have recently appeared, but one […]
Jun, 16

Grover: Looking for Performance Improvement by Disabling Local Memory Usage in OpenCL Kernels

Due to the diversity of processor architectures and application memory access patterns, the performance impact of using local memory in OpenCL kernels has become unpredictable. For example, enabling the use of local memory for an OpenCL kernel can be beneficial for the execution on a GPU, but can lead to performance losses when running on […]
Jun, 15

Toward OpenCL Automatic Multi-Device Support

To fully tap into the potential of today heterogeneous machines, offloading parts of an application on accelerators is no longer sufficient. The real challenge is to build systems where the application would permanently spread across the entire machine, that is, where parallel tasks would be dynamically scheduled over the full set of available processing units. […]
Jun, 14

Dynamic loop vectorization for executing OpenCL kernels on CPUs

Heterogeneous computing platforms are becoming increasingly important in supercomputing. Many systems now integrate CPUs and GPUs cooperating together on a single node. Much effort is invested in tuning GPU-kernels. However, it can be the case that some systems may not have GPUs or the GPUs are busy. Maintaining two versions of the same code for […]
Jun, 9

Efficient all-against-all protein similarity matrix computation using OpenCL

In this report we introduced CLSW, a fast GPU-based Smith-Waterman score-only-alignment calculator. While generally applicable for any protein alignment problem, it was designed specifically as a proof-of-concept application for SIMAP. Even if we had only two weeks to develop a fully functional, validated and optimized implementation and all related concepts, our results show that in […]
Jun, 9

GPU-Accelerated Dynamic Functional Connectivity Analysis for Functional MRI Data Using OpenCL

Intense computations in engineering and science, especially bioinformatics have been made practical by the recent advances in Graphical Processing Unit (GPU) computing technology. In this study, implementation and performance evaluations for a GPU-accelerated dynamic functional connectivity (DFC) analysis, which is an analysis method for investigating dynamic interactions among different brain networks, is presented. Open Computing […]
Jun, 9

3D Skeleton Extraction Method using Potential Field on OpenCL

For 3D skeleton extraction, the algorithm based on generalized potential fields, known as the outstandingly flexible and robust method, is suffering from seriously heavy computational burden. In this paper, we put forward a parallel algorithm based on OpenCL heterogeneous parallel framework, which can make full use of the great computing power provided by heterogeneous model […]
May, 30

Data Layout Optimization for Multi-Valued Containers in OpenCL

Scientific data is mostly multi-valued, e.g., coordinates, velocities, moments or feature components, and it comes in large quantities. The data layout of such containers has an enormous impact on the achieved performance, however, layout optimization is very time-consuming and error-prone because container access syntax in standard programming languages is not sufficiently abstract. This means that […]
May, 26

GPU Passthrough Performance: A Comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL Applications

As more scientific workloads are moved into the cloud, the need for high performance accelerators increases. Accelerators such as GPUs offer improvements in both performance and power efficiency over traditional multi-core processors; however, their use in the cloud has been limited. Today, several common hypervisors support GPU passthrough, but their performance has not been systematically […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: