Posts
Aug, 18
OpenCL-Based Design of an FPGA Accelerator for Phase-Based Correspondence Matching
This paper proposes a Field Programmable Gate Array (FPGA) implementation of the stereo correspondence matching using Phase-Only Correlation (POC). The use of high-accuracy stereo correspondence matching based on POC makes it possible to measure accurate 3D shape of an object using stereo vision. The drawback of the POC-based approach is its high computational cost. To […]
Aug, 18
Parallelizing a high-order WENO scheme for complicated flow structures on GPU and MIC
As a conservative, high-order accurate, shock-capturing method, weighted essentially non-oscillatory (WENO) scheme have been widely used to effectively resolve complicated flow structures in computational fluid dynamics (CFD) simulations. However, using a high-order WENO scheme can be highly time-consuming, which greatly limits the CFD application’s performance efficiency. In this paper, we present various parallel strategies base […]
Aug, 18
Runtime Code Generation and Data Management for Heterogeneous Computing in Java
GPUs (Graphics Processing Unit) and other accelerators are nowadays commonly found in desktop machines, mobile devices and even data centres. While these highly parallel processors offer high raw performance, they also dramatically increase program complexity, requiring extra effort from programmers. This results in difficult-to-maintain and non-portable code due to the low-level nature of the languages […]
Aug, 18
RubiCL, a Library Providing Automatic Parallelisation on CPU and GPU devices
This project presents a library that automates the parallelisation of several higherorder functions, originally provided within the Ruby standard-library. The library distributes computation across many compute-units, following an annotation specifying that primitives are solely operating on numerical data. RubiCL harnesses the OpenCL framework in order to allow execution to occur on CPU or GPU devices. […]
Aug, 18
Optimizing OpenCL Local Work Group Size With Machine Learning
GPU architectures are becoming increasingly important due to their high number of processors. The single input multiple data architecture has proven to work not just for the graphics domain, but also for many other disciplines. This is due to the potential performance that can be achieved by a consumer-level GPU being significantly higher than the […]
Aug, 14
A Tool for Automatically Suggesting Source-Code Optimizations for Complex GPU Kernels
Future computing systems, from handhelds to supercomputers, will undoubtedly be more parallel and heterogeneous than today’s systems to provide more performance and energy efficiency. Thus, GPUs are increasingly being used to accelerate general-purpose applications, including applications with data-dependent, irregular control flow and memory access patterns. However, the growing complexity, exposed memory hierarchy, incoherence, heterogeneity, and […]
Aug, 14
MPC: A Massively Parallel Compression Algorithm for Scientific Data
Due to their high peak performance and energy efficiency, massively parallel accelerators such as GPUs are quickly spreading in high-performance computing, where large amounts of floating-point data are processed, transferred, and stored. Such environments can greatly benefit from data compression if done sufficiently quickly. Unfortunately, most conventional compression algorithms are unsuitable for highly parallel execution. […]
Aug, 14
Bufferless NOC Simulation of Large Multicore System on GPU Hardware
Last level cache management and core interconnection network play important roles in performance and power consumption in multicore system. Large scale chip multicore uses mesh interconnect widely due to scalability and simplicity of the mesh interconnection design. As interconnection network occupied significant area and consumes significant percent of system power, bufferless network is an appealing […]
Aug, 14
Automatic classification of object code using machine learning
Recent research has repeatedly shown that machine learning techniques can be applied to either whole files or file fragments to classify them for analysis. We build upon these techniques to show that for samples of un-labeled compiled computer object code, one can apply the same type of analysis to classify important aspects of the code, […]
Aug, 14
Processing Markov Logic Networks with GPUs
Graphics Processing Units (GPUs) are being widely used to improve performance of machine learning and logic programming systems. Next, we propose using this technique to improve the performance of Markov logic programs. In this paper we focus on the first step of the inference phase, the grounding of first-order logical formulas composing a Markov network. […]
Aug, 13
Perception of Acoustical Spatial Attributes and Impression in Virtually Rendered Sound Field
Computation power to simulate sound fields from the three-dimensional numerical models has progressed fast; for example, using GPU cluster systems. We can render directivity, position, distance, and reverberation of sound sources in a practical time. Furthermore, a multichannel sound field system can be realized with low-cost digital-to-analog converter modules. Moreover, some researchers are trying to […]
Aug, 13
An Introduction to High Performance Computing on AWS
This paper describes a range of high performance computing (HPC) applications that are running today on Amazon Web Services (AWS). You will learn best practices for cloud deployment, for cluster and job management, and for the management of third-party software. This whitepaper covers HPC use cases that include highly distributed, highly parallel grid computing applications, […]

