14252

Posts

Jul, 13

Optimization, Specification and Verification of the Prefix Sum Program in an OpenCL Environment

The Prefix Sum is an algorithm used as a building block for various other algorithms, for example radix sort, quicksort and lexically comparing strings. Implementing the Prefix Sum algorithm on the CPU is trivial, but a parallel approach with OpenCL is more complicated. An implementation in OpenCL has been made, and optimized to minimize branch […]
Jul, 6

LTTng CLUST: A system-wide unified CPU and GPU tracing tool for OpenCL applications

As computation schemes evolve and many new tools become available to programmers to enhance the performance of their applications, many programmers started to look towards highly parallel platforms such as Graphical Processing Unit (GPU). Offloading computations that can take advantage of the architecture of the GPU is a technique that has proven fruitful in recent […]
Jun, 30

CPU and GPU Implementation of QCD by using OpenCL

Recently, many particle physics applications can be parallelized by using multicore platforms such as CPU and GPU. In this paper, we propose a parallel processing approach for Quantum ChromoDynamics(QCD) application by using both CPU and GPU. Instead of distributing the parallelizable workload to either CPU or GPU, we distribute the workload simultaneously into both CPU […]
Jun, 19

Parallel BTF Compression with Multi-Level Vector Quantization in OpenCL

Bidirectional Texture Function (BTF) as an effective visual fidelity representation of surface appearance is becoming more and more widely used. In this paper we report on contributions to BTF data compression for multi-level vector quantization. We describe novel decompositions that improve the compression ratio by 15% in comparison with the original method, without loss of […]
Jun, 19

Visualization of OpenCL Application Execution on CPU-GPU Systems

Evaluating the performance of parallel and heterogeneous programs and architectures can be challenging. An emulator or simulator can be used to aid the programmer. To provide guidance and feedback to the programmer, the simulator needs to present traces, reports, and debugging information in a coherent and unambiguous format. Although these outputs contain a lot of […]
Jun, 16

Parallelization of DIRA and CTmod using OpenMP and OpenCL

Parallelization is the answer to the ever-growing demands of computing power by taking advantage of multi-core processor technology and modern many-core graphics compute units. Multi-core CPUs and many-core GPUs have the potential to substantially reduce the execution time of a program but it is often a challenging task to ensure that all available hardware is […]
Jun, 8

Improving OpenCL Programmability with the Heterogeneous Programming Library

The use of heterogeneous devices is becoming increasingly widespread. Their main drawback is their low programmability due to the large amount of details that must be handled. Another important problem is the reduced code portability, as most of the tools to program them are vendor or device-specific. The exception to this observation is OpenCL, which […]
Jun, 5

Machine Learning Based Auto-tuning for Enhanced OpenCL Performance Portability

Heterogeneous computing, which combines devices with different architectures, is rising in popularity, and promises increased performance combined with reduced energy consumption. OpenCL has been proposed as a standard for programing such systems, and offers functional portability. It does, however, suffer from poor performance portability, code tuned for one device must be re-tuned to achieve good […]
May, 25

Nucleation of nanoparticles in a coarse grained fluid using OpenCL

In this thesis, the nucleation rate of almost hard spheres in a course-grained fluid is measured to study the effects of an explicit solvent on the nucleation rate. Previous measurements show a discrepancy between physical measurements and simulations, where the latter all used implicit solvents. In this thesis, the fluid is approximated using Stochastic Rotation […]
May, 22

An Introduction to OpenCL C++

Today servers, desktops, mobile devices, and embedded systems contain many processors in addition to the CPU that runs programs. These extra processors are generally called accelerators and could be a GPU, FPGA, Xeon Phi, or other programmable device. There are many types of accelerators available, from many vendors, for many different environments. Khronos developed the […]
May, 19

CHO: Towards a Benchmark Suite for OpenCL FPGA Accelerators

Programming FPGAs with OpenCL-based high-level synthesis frameworks is gaining attention with a number of commercial and research frameworks announced. However, there are no benchmarks for evaluating these frameworks. To this end, we present CHO benchmark suite an extension of CHStone, a commonly used C-based high-level synthesis benchmark suite, for OpenCL. We characterise CHO at various […]
May, 3

IPMACC: Translating OpenACC API to OpenCL

In this paper, we introduce IPMACC a framework for executing OpenACC for C applications over OpenCL runtime. We use over framework to compare performance of OpenACC and OpenCL. OpenACC API abstractions remove the low-level control from programmers’ hand. To understand the low-level OpenCL optimizations that are not applicable in OpenACC, we compare highly-optimized OpenCL and […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: