Posts
Aug, 21
Implementing Computer Vision Functions with OpenCL on the Qualcomm Adreno 420
Computer vision algorithms are becoming increasingly important in mobile, embedded, and wearable devices and applications. These compute-intensive workloads are challenging to implement with good performance and power-efficiency. In many applications, implementing critical portions of computer vision workloads on a general-purpose graphics processing unit (GPU) is an attractive solution. Qualcomm enables programming of the Adreno GPU […]
Aug, 21
A CPU and GPU Heterogeneous Processing of Multimedia Data by using OpenCL
In recent times, it has become possible to parallelize many multimedia applications using multicore platforms such as CPUs and GPUs. In this paper, we propose a parallel processing approach for a multimedia application by using both the CPU and GPU. Instead of distributing the parallelizable workload to either the CPU or GPU, we distribute the […]
Aug, 21
Locality Aware Work-Stealing Based Scheduling in Hybrid CPU-GPU
We study work-stealing based scheduling on a cluster of nodes with CPUs and GPUs. In particular, we evaluate locality aware scheduling in the context of distributed shared memory style programming, where the user is oblivious to data placement. Our runtime maintains a distributed map of data resident on various nodes and uses it to estimate […]
Aug, 21
A Parallelizing Matlab Compiler Framework and Run time for Heterogeneous Systems
Compute-intensive applications incorporate ever increasing data processing requirements on hardware systems. Many of these applications have only recently become feasible thanks to the increasing computing power of modern processors. The Matlab language is uniquely situated to support the description of these compute-intensive scientific applications, and consequently has been continuously improved to provide increasing computational support […]
Aug, 21
GPU computing with OpenCL to model 2D elastic wave propagation: exploring memory usage
Graphics processing units (GPUs) have become increasingly powerful in recent years. Programs exploring the advantages of this architecture could achieve large performance gains and this is the aim of new initiatives in high performance computing. The objective of this work is to develop an efficient tool to model 2D elastic wave propagation on parallel computing […]
Aug, 18
OpenCL-Based Design of an FPGA Accelerator for Phase-Based Correspondence Matching
This paper proposes a Field Programmable Gate Array (FPGA) implementation of the stereo correspondence matching using Phase-Only Correlation (POC). The use of high-accuracy stereo correspondence matching based on POC makes it possible to measure accurate 3D shape of an object using stereo vision. The drawback of the POC-based approach is its high computational cost. To […]
Aug, 18
Parallelizing a high-order WENO scheme for complicated flow structures on GPU and MIC
As a conservative, high-order accurate, shock-capturing method, weighted essentially non-oscillatory (WENO) scheme have been widely used to effectively resolve complicated flow structures in computational fluid dynamics (CFD) simulations. However, using a high-order WENO scheme can be highly time-consuming, which greatly limits the CFD application’s performance efficiency. In this paper, we present various parallel strategies base […]
Aug, 18
Runtime Code Generation and Data Management for Heterogeneous Computing in Java
GPUs (Graphics Processing Unit) and other accelerators are nowadays commonly found in desktop machines, mobile devices and even data centres. While these highly parallel processors offer high raw performance, they also dramatically increase program complexity, requiring extra effort from programmers. This results in difficult-to-maintain and non-portable code due to the low-level nature of the languages […]
Aug, 18
RubiCL, a Library Providing Automatic Parallelisation on CPU and GPU devices
This project presents a library that automates the parallelisation of several higherorder functions, originally provided within the Ruby standard-library. The library distributes computation across many compute-units, following an annotation specifying that primitives are solely operating on numerical data. RubiCL harnesses the OpenCL framework in order to allow execution to occur on CPU or GPU devices. […]
Aug, 18
Optimizing OpenCL Local Work Group Size With Machine Learning
GPU architectures are becoming increasingly important due to their high number of processors. The single input multiple data architecture has proven to work not just for the graphics domain, but also for many other disciplines. This is due to the potential performance that can be achieved by a consumer-level GPU being significantly higher than the […]
Aug, 14
A Tool for Automatically Suggesting Source-Code Optimizations for Complex GPU Kernels
Future computing systems, from handhelds to supercomputers, will undoubtedly be more parallel and heterogeneous than today’s systems to provide more performance and energy efficiency. Thus, GPUs are increasingly being used to accelerate general-purpose applications, including applications with data-dependent, irregular control flow and memory access patterns. However, the growing complexity, exposed memory hierarchy, incoherence, heterogeneity, and […]
Aug, 14
MPC: A Massively Parallel Compression Algorithm for Scientific Data
Due to their high peak performance and energy efficiency, massively parallel accelerators such as GPUs are quickly spreading in high-performance computing, where large amounts of floating-point data are processed, transferred, and stored. Such environments can greatly benefit from data compression if done sufficiently quickly. Unfortunately, most conventional compression algorithms are unsuitable for highly parallel execution. […]