Posts
Dec, 20
KFusion: Obtaining Modularity and Performance with Regards to General Purpose GPU Computing and Co-processors
Concurrency has recently come to the forefront of computing as multi-core processors become more and more common. General purpose graphics processing unit computing brings with them new language support for dealing with co-processor environments such as OpenCL and CUDA. Programming language support for multi-core architectures introduces a fundamentally new mechanism for modularity – a kernel. […]
Dec, 18
Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration
Heterogeneous computing system increases the performance of parallel computing in many domain of general purpose computing with CPU, GPU and other accelerators. With Hardware developments, the software developments like Compute Unified Device Architecture(CUDA) and Open Computing Language (OpenCL) try to offer a simple and visualized tool for parallel computing. But it turn out to be […]
Dec, 18
Parallelisation of Shallow Water Simulation for Heterogeneous Architectures
This work presents the parallelisation of a shallow water simulation model. Two parallel implementations are developed. One is for a multi-core NUMA architecture, developed in OpenMP. The other one is for a many-core GPU-accelerated architecture and is developed in OpenCL. The parallelisation process is based on an iterative approach, starting off from a naive implementation. […]
Dec, 15
Performance study of using the Direct Compute API for implementing Support vector machines on GPUs
Today graphics processing units (GPUs) are not only able to generate graphical imaging but also able to expose its multicore architecture to increase computationally heavy general purpose algorithms that can be adapted to the multicore architecture of the GPU. The study conducted in this thesis explores the efficiency of using the general purpose graphics processing […]
Dec, 12
Towards Domain-specific Computing for Stencil Codes in HPC
High Performance Computing (HPC) systems are nowadays more and more heterogeneous. Different processor types can be found on a single node including accelerators such as Graphics Processing Units (GPUs). To cope with the challenge of programming such complex systems, this work presents a domain-specific approach to automatically generate code tailored to different processor types. Low-level […]
Dec, 10
Neither More Nor Less: Optimizing Thread-level Parallelism for GPGPUs
General-purpose Graphic processing units (GPGPUs) are at their best in accelerating computation by exploiting abundant thread-level parallelism (TLP) offered by many classes of HPC applications. To facilitate such high TLP, emerging programming models like CUDA and OpenCL allow programmers to create work abstractions in terms of smaller work units, called cooperative thread arrays (CTAs), consisting […]
Dec, 8
OpenMP Programming on Intel R Xeon Phi TM Coprocessors: An Early Performance Comparison
The demand for more and more compute power is growing rapidly in many fields of research. Accelerators, like GPUs, are one way to fulfill these requirements, but they often require a laborious rewrite of the application using special programming paradigms like CUDA or OpenCL. The Intel(R) Xeon Phi(TM) coprocessor is based on the Intel(R) Many […]
Dec, 4
Fast Parallel Sorting Algorithms on GPUs
This paper presents a comparative analysis of the three widely used parallel sorting algorithms: OddEven sort, Rank sort and Bitonic sort in terms of sorting rate, sorting time and speed-up on CPU and different GPU architectures. Alongside we have implemented novel parallel algorithm: min-max butterfly network, for finding minimum and maximum in large data sets. […]
Dec, 4
A MPI back-end for the OpenACC accULL. Exploiting OpenACC semantics in Message Passing Clusters
The irruption in the HPC scene of hardware acceletarors has made available unprecedented performance to developers. However, even expert developers may not be ready to exploit the complex hierarchies of these new heterogeneous systems. We need to find a way to leverage the programming effort in these architectures at programming language level, otherwise, developers will […]
Dec, 1
CPUless PCs inside networked control systems
This paper represents results of adavancing our previous WSEAS paper[1] and is aimed to basics for design framework that helps design hard real-time control systems using Unix/Unix like operating systems. This framework is designed while solving research project supported by the Slovak Research and Development Agency under the contract No. VMSP-II-0034-09. This framework contains layer […]
Nov, 27
A Data-Parallel Algorithmic Modelica Extension for Efficient Execution on Multi-Core Platforms
New multi-core CPU and GPU architectures promise high computational power at a low cost if suitable computational algorithms can be developed. However, parallel programming for such architectures is usually non-portable, low-level and error-prone. To make the computational power of new multi-core architectures more easily available to Modelica modelers, we have developed the ParModelica algorithmic language […]
Nov, 26
A compiler toolkit for array-based languages targeting CPU/GPU hybrid systems
This paper presents a compiler toolkit that addresses two important emerging challenges: (1) effectively compiling dynamic array-based languages such as MATLAB, Python and R; and (2) effectively utilizing a wide range of rapidly evolving hybrid CPU/GPU architectures. The toolkit provides: a high-level IR specifically designed to express a wide range of arraybased computations and indexing […]