Posts
Oct, 7
Vectorized OpenCL implementation of numerical integration for higher order finite elements
In our work we analyze computational aspects of the problem of numerical integration in finite element calculations and consider an OpenCL implementation of related algorithms for processors with wide vector registers. As a platform for testing the implementation we choose the PowerXCell processor, being an example of the Cell Broadband Engine (CellBE) architecture. Although the […]
Sep, 28
From OpenCL to Gates: the FFT
The FFT plays a fundamental role in OFDM programmable digital baseband communication systems under the SDR context. The core nature of this algorithm marks it as a primary target for acceleration. Since long frame lengths of the FFT are desirable in order to achieve higher bitrates, the computational complexity becomes even more significant. In this […]
Sep, 27
OpenCL Parallel Programming Development Cookbook
Welcome to the OpenCL Parallel Programming Development Cookbook! Whew, that was more than a mouthful. This book was written by a developer, that’s me, and for a developer, hopefully that’s you. This book will look familiar to some and distinct to others. It is a result of my experience with OpenCL, but more importantly in […]
Sep, 25
OpenCL Task Partitioning in the Presence of GPU Contention
Heterogeneous multi- and many-core systems are increasingly prevalent in the desktop and mobile domains. On these systems it is common for programs to compete with co-running programs for resources. While multi-task scheduling for CPUs is a well-studied area, how to partitioning and map computing tasks onto the hetergeneous system in the presence of GPU contention […]
Sep, 23
Performance of OpenCL
OpenCL is a relatively new standard that supports computation on a variety of parallel architectures. The author was unable to find reliable information about performance of OpenCL programs on CPU’s in comparison to traditional parallel processing standards like OpenMP. This paper describes the results of an experiment that tries to answer the following question: "Which […]
Sep, 22
Accelerating Habanero-Java Programs with OpenCL Generation
The initial wave of programming models for general-purpose computing on GPUs, led by CUDA and OpenCL, has provided experts with low-level constructs to obtain significant performance and energy improvements on GPUs. However, these programming models are characterized by a challenging learning curve for non-experts due to their complex and low-level APIs. Looking to the future, […]
Sep, 20
Workload Analysis and Efficient OpenCL-based Implementation of SIFT Algorithm on a Smartphone
Feature detection and extraction are essential in computer vision applications such as image matching and object recognition. The Scale-Invariant Feature Transform (SIFT) algorithm is one of the most robust approaches to detect and extract distinctive invariant features from images. However, high computational complexity makes it difficult to apply the SIFT algorithm to mobile applications. Recent […]
Sep, 9
Implementation of PDE models of cardiac dynamics on GPUs using OpenCL
Graphical processing units (GPUs) promise to revolutionize scientific computing in the near future. Already, they allow almost real-time integration of simplified numerical models of cardiac tissue dynamics. However, the integration methods that have been developed so far are typically of low order and use single precision arithmetics. In this work, we describe numerical implementation of […]
Sep, 7
Comparison and Analysis of GPU Energy Effciency For CUDA and OpenCL
The use of GPUs for processing large sets of parallelizable data has increased sharply in recent years. As the concept of GPU computing is still relatively young, parameters other than computation time, such as energy eciency, are being overlooked. Two parallel computing platforms, CUDA and OpenCL, provide developers with an interface that they can use […]
Aug, 26
OpenCL programming using Python syntax
We describe ocl, a Python library built on top of pyOpenCL and numpy. It allows programming GPU devices using Python. Python functions which are marked up using the provided decorator, are converted into C99/OpenCL and compiled using the JIT at runtime. This approach lowers the barrier to entry to programming GPU devices since it requires […]
Aug, 26
SystemC simulation on GP-GPUs: CUDA vs. OpenCL
SystemC is a widespread language for developing SoC designs. Unfortunately, most SystemC simulators are based on a strictly sequential scheduler that heavily limits their performance, impacting verification schedules and time-to-market of new designs. Parallelizing SystemC simulation entails a complete re-design of the simulator kernel for the specific target parallel architectures. This paper proposes an automatic […]
Aug, 24
SOCL: An OpenCL Implementation with Automatic Multi-Device Adaptation Support
To fully tap into the potential of today’s heterogeneous machines, offloading parts of an application on accelerators is not sufficient. The real challenge is to build systems where the application would permanently spread across the entire machine, that is, where parallel tasks would be dynamically scheduled over the full set of available processing units. In […]