9023

Posts

Dec, 12

Towards Domain-specific Computing for Stencil Codes in HPC

High Performance Computing (HPC) systems are nowadays more and more heterogeneous. Different processor types can be found on a single node including accelerators such as Graphics Processing Units (GPUs). To cope with the challenge of programming such complex systems, this work presents a domain-specific approach to automatically generate code tailored to different processor types. Low-level […]
Dec, 10

Neither More Nor Less: Optimizing Thread-level Parallelism for GPGPUs

General-purpose Graphic processing units (GPGPUs) are at their best in accelerating computation by exploiting abundant thread-level parallelism (TLP) offered by many classes of HPC applications. To facilitate such high TLP, emerging programming models like CUDA and OpenCL allow programmers to create work abstractions in terms of smaller work units, called cooperative thread arrays (CTAs), consisting […]
Dec, 8

OpenMP Programming on Intel R Xeon Phi TM Coprocessors: An Early Performance Comparison

The demand for more and more compute power is growing rapidly in many fields of research. Accelerators, like GPUs, are one way to fulfill these requirements, but they often require a laborious rewrite of the application using special programming paradigms like CUDA or OpenCL. The Intel(R) Xeon Phi(TM) coprocessor is based on the Intel(R) Many […]
Dec, 4

Fast Parallel Sorting Algorithms on GPUs

This paper presents a comparative analysis of the three widely used parallel sorting algorithms: OddEven sort, Rank sort and Bitonic sort in terms of sorting rate, sorting time and speed-up on CPU and different GPU architectures. Alongside we have implemented novel parallel algorithm: min-max butterfly network, for finding minimum and maximum in large data sets. […]
Dec, 4

A MPI back-end for the OpenACC accULL. Exploiting OpenACC semantics in Message Passing Clusters

The irruption in the HPC scene of hardware acceletarors has made available unprecedented performance to developers. However, even expert developers may not be ready to exploit the complex hierarchies of these new heterogeneous systems. We need to find a way to leverage the programming effort in these architectures at programming language level, otherwise, developers will […]
Dec, 1

CPUless PCs inside networked control systems

This paper represents results of adavancing our previous WSEAS paper[1] and is aimed to basics for design framework that helps design hard real-time control systems using Unix/Unix like operating systems. This framework is designed while solving research project supported by the Slovak Research and Development Agency under the contract No. VMSP-II-0034-09. This framework contains layer […]
Nov, 27

A Data-Parallel Algorithmic Modelica Extension for Efficient Execution on Multi-Core Platforms

New multi-core CPU and GPU architectures promise high computational power at a low cost if suitable computational algorithms can be developed. However, parallel programming for such architectures is usually non-portable, low-level and error-prone. To make the computational power of new multi-core architectures more easily available to Modelica modelers, we have developed the ParModelica algorithmic language […]
Nov, 26

A compiler toolkit for array-based languages targeting CPU/GPU hybrid systems

This paper presents a compiler toolkit that addresses two important emerging challenges: (1) effectively compiling dynamic array-based languages such as MATLAB, Python and R; and (2) effectively utilizing a wide range of rapidly evolving hybrid CPU/GPU architectures. The toolkit provides: a high-level IR specifically designed to express a wide range of arraybased computations and indexing […]
Nov, 24

GPU Isosurface Raycasting of FCC Datasets

This paper presents an efficient and accurate isosurface rendering algorithm for the natural C^1 splines on the face-centered cubic (FCC) lattice. Leveraging fast and accurate evaluation of a spline field and its gradient, accompanied by efficient empty-space skipping, the approach generates high-quality isosurfaces of FCC datasets at interactive speed (20-70 fps). The pre-processing computation (quasi-interpolation […]
Nov, 18

Auto-tunable GPU BLAS (thesis)

In this paper, we present our implementation of an Auto tuning system, written in C++, which incorporate the use of OpenCL kernels. We deploy this approach on different GPU architectures, evaluating the performance of the approach. Our main focus is to easily generate tuned code, that would otherwise require a large amount of empirical testing, […]
Nov, 14

Real-Time Scheduling Using GPUs – Advanced and More Accurate Proof of Feasibility

This paper will report our evaluation to use OpenCL as a platform for hard real-time scheduling. Especially, we have evaluated which types of tasks are faster on GPGPU than on CPU. We have investigated computational tasks, memory intensive tasks (especially tasks using low latency GDDR memory) and disk intensive tasks. This study is the part […]
Nov, 10

Efficient Dynamic Derived Field Generation on Many-Core Architectures Using Python

Derived field generation is a critical aspect of many visualization and analysis systems. This capability is frequently implemented by providing users with a language to create new fields and then translating their "programs" into a pipeline of filters that are combined in sequential fashion. Although this design is highly extensible and practical for development, the […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: