Posts
Aug, 21
A Parallelizing Matlab Compiler Framework and Run time for Heterogeneous Systems
Compute-intensive applications incorporate ever increasing data processing requirements on hardware systems. Many of these applications have only recently become feasible thanks to the increasing computing power of modern processors. The Matlab language is uniquely situated to support the description of these compute-intensive scientific applications, and consequently has been continuously improved to provide increasing computational support […]
Aug, 21
Implementing Computer Vision Functions with OpenCL on the Qualcomm Adreno 420
Computer vision algorithms are becoming increasingly important in mobile, embedded, and wearable devices and applications. These compute-intensive workloads are challenging to implement with good performance and power-efficiency. In many applications, implementing critical portions of computer vision workloads on a general-purpose graphics processing unit (GPU) is an attractive solution. Qualcomm enables programming of the Adreno GPU […]
Aug, 21
GPU computing with OpenCL to model 2D elastic wave propagation: exploring memory usage
Graphics processing units (GPUs) have become increasingly powerful in recent years. Programs exploring the advantages of this architecture could achieve large performance gains and this is the aim of new initiatives in high performance computing. The objective of this work is to develop an efficient tool to model 2D elastic wave propagation on parallel computing […]
Aug, 18
Parallelizing a high-order WENO scheme for complicated flow structures on GPU and MIC
As a conservative, high-order accurate, shock-capturing method, weighted essentially non-oscillatory (WENO) scheme have been widely used to effectively resolve complicated flow structures in computational fluid dynamics (CFD) simulations. However, using a high-order WENO scheme can be highly time-consuming, which greatly limits the CFD application’s performance efficiency. In this paper, we present various parallel strategies base […]
Aug, 18
Runtime Code Generation and Data Management for Heterogeneous Computing in Java
GPUs (Graphics Processing Unit) and other accelerators are nowadays commonly found in desktop machines, mobile devices and even data centres. While these highly parallel processors offer high raw performance, they also dramatically increase program complexity, requiring extra effort from programmers. This results in difficult-to-maintain and non-portable code due to the low-level nature of the languages […]
Aug, 18
RubiCL, a Library Providing Automatic Parallelisation on CPU and GPU devices
This project presents a library that automates the parallelisation of several higherorder functions, originally provided within the Ruby standard-library. The library distributes computation across many compute-units, following an annotation specifying that primitives are solely operating on numerical data. RubiCL harnesses the OpenCL framework in order to allow execution to occur on CPU or GPU devices. […]
Aug, 18
Optimizing OpenCL Local Work Group Size With Machine Learning
GPU architectures are becoming increasingly important due to their high number of processors. The single input multiple data architecture has proven to work not just for the graphics domain, but also for many other disciplines. This is due to the potential performance that can be achieved by a consumer-level GPU being significantly higher than the […]
Aug, 18
OpenCL-Based Design of an FPGA Accelerator for Phase-Based Correspondence Matching
This paper proposes a Field Programmable Gate Array (FPGA) implementation of the stereo correspondence matching using Phase-Only Correlation (POC). The use of high-accuracy stereo correspondence matching based on POC makes it possible to measure accurate 3D shape of an object using stereo vision. The drawback of the POC-based approach is its high computational cost. To […]
Aug, 14
A Tool for Automatically Suggesting Source-Code Optimizations for Complex GPU Kernels
Future computing systems, from handhelds to supercomputers, will undoubtedly be more parallel and heterogeneous than today’s systems to provide more performance and energy efficiency. Thus, GPUs are increasingly being used to accelerate general-purpose applications, including applications with data-dependent, irregular control flow and memory access patterns. However, the growing complexity, exposed memory hierarchy, incoherence, heterogeneity, and […]
Aug, 14
MPC: A Massively Parallel Compression Algorithm for Scientific Data
Due to their high peak performance and energy efficiency, massively parallel accelerators such as GPUs are quickly spreading in high-performance computing, where large amounts of floating-point data are processed, transferred, and stored. Such environments can greatly benefit from data compression if done sufficiently quickly. Unfortunately, most conventional compression algorithms are unsuitable for highly parallel execution. […]
Aug, 14
Bufferless NOC Simulation of Large Multicore System on GPU Hardware
Last level cache management and core interconnection network play important roles in performance and power consumption in multicore system. Large scale chip multicore uses mesh interconnect widely due to scalability and simplicity of the mesh interconnection design. As interconnection network occupied significant area and consumes significant percent of system power, bufferless network is an appealing […]
Aug, 14
Automatic classification of object code using machine learning
Recent research has repeatedly shown that machine learning techniques can be applied to either whole files or file fragments to classify them for analysis. We build upon these techniques to show that for samples of un-labeled compiled computer object code, one can apply the same type of analysis to classify important aspects of the code, […]