12842

Posts

Jul, 12

Parallel Implementations for Solving Shortest Path Problem using Bellman-Ford

In this paper, different parallel implementations of Bellman-Ford algorithm on GPU using OpenCL are presented. These variants include Bellman-Ford for solving single source shortest path (SSSP) having two variants and Bellman-Ford for all pair shortest path (APSP) problems. Also, a comparative analysis of their performances on CPU and GPU is discussed in this paper.Write-write consistency […]
Jul, 7

GiMMiK – Generating Bespoke Matrix Multiplication Kernels for Various Hardware Accelerators; Applications in High-Order Computational Fluid Dynamics

Matrix multiplication is a fundamental linear algebra routine ubiquitous in all areas of science and engineering. Highly optimised BLAS libraries (cuBLAS and clBLAS on GPUs) are the most popular choices for an implementation of the General Matrix Multiply (GEMM) in software. However, performance of library GEMM is poor for small matrix sizes. In this thesis […]
Jul, 6

A Parallelized Implementation for H. 264 Real-time Encoding Scheme

In this paper, a high-speed video stream encoder for the H.264 digital video codec standard specification is accelerated with nowadays parallel processing architectures. Based on the parallel processing techniques with GPU’s, we used an OpenCL-based GPU kernel programs, and finally achieved a high-level CPU-GPU interoperability. In its design, our system makes the CPU perform all […]
Jul, 6

High-level Parallel Programming Support for Heterogeneous Systems

This master thesis focuses on several high-level parallel programming models for heterogeneous systems that have been becoming increasingly popular in the field of high-performance computing. Heterogeneous systems are an inexpensive and effective way for further performance improvements. A powerful combination of graphics processing units (GPUs) and central processing units (CPUs) is one of the most […]
Jul, 4

Writing self-adaptive codes for heterogeneous systems

Heterogeneous systems are becoming increasingly common. Relatedly, the popularity of OpenCL is growing, as it provides a unified mean to program a wide variety of devices including GPUs or multicore CPUs. More recently, the Heterogeneous Programming Library (HPL) targets the same variety of systems as OpenCL, intending to improve their programmability. The main drawback of […]
Jul, 4

A second generation of DEFG: Declarative Framework for GPUs

DEFG is our declarative language and framework for the efficient generation of OpenCL GPU applications. Using our new DEFG implementation, run-time and lines-of-code comparisons are provided for three well-known algorithms: Sobel image filtering, breadth-first search and all-pairs shortest path. The DEFG declarative language and corresponding OpenCL kernels provide complete OpenCL applications. The lines-of-code comparison demonstrates […]
Jul, 4

Parallel Implementation of Travelling Salesman Problem using Ant Colony Optimization

In this paper we have proposed parallel implementation of Ant colony optimization Ant System algorithm on GPU using OpenCL. We have done comparison on different parameters of the ACO which directly or indirectly affect the result. Parallel comparison of speedup between CPU and GPU implementation is done with a speed up of 3.11x in CPU […]
Jun, 24

AES encryption on modern consumer architectures

Specialized cryptographic processors target professional applications and offer both low latency and high throughput at the expense of cost. At the consumer level, a modern SoC embodies several accelerators and vector extensions (e.g. SSE, AES-NI), having a high degree of programmability through multiple APIs (OpenMP, OpenCL, etc). This work explains how a modern x86 system […]
Jun, 23

Runtime Visualization of Application Progress and Monitoring of a GPU-enabled Parallel Environment

The paper presents design, implementation and real life uses of a visualization subsystem for a distributed framework for parallelization of work-flow-based computations among clusters with nodes that feature both CPUs and GPUs. Firstly, the proposed system presents a graphical view of the infrastructure with clusters, nodes and compute devices along with parameters and runtime graphs […]
Jun, 19

Parallel track reconstruction in CMS using the cellular automaton approach

The Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) is a general-purpose particle detector and comprises the largest silicon-based tracking system built to date with 75 million individual readout channels. The precise reconstruction of particle tracks from this tremendous amount of input channels is a compute-intensive task. The foreseen LHC beam parameters […]
Jun, 17

On the Performance Portability of Structured Grid Codes on Many-Core Computer Architectures

With the advent of many-core computer architectures such as GPGPUs from NVIDIA and AMD, and more recently Intel’s Xeon Phi, ensuring performance portability of HPC codes is potentially becoming more complex. In this work we have focused on one important application area — structured grid codes — and investigated techniques for ensuring performance portability across […]
Jun, 17

HAM – Heterogenous Active Messages for Efficient Offloading on the Intel Xeon Phi

The applicability of accelerators is limited by the attainable speed-up for the offloaded computations and by the offloading overheads. While GPU programming models like CUDA and OpenCL only allow to optimise the application code and its speed-up, the available low-level APIs for the Intel Xeon Phi provide opportunity to address the overheads, too. This work […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: