high performance computing on graphics processing units: hgpu.org

Posts

Dec, 20

KFusion: Obtaining Modularity and Performance with Regards to General Purpose GPU Computing and Co-processors

Concurrency has recently come to the forefront of computing as multi-core processors become more and more common. General purpose graphics processing unit computing brings with them new language support for dealing with co-processor environments such as OpenCL and CUDA. Programming language support for multi-core architectures introduces a fundamentally new mechanism for modularity – a kernel. […]

OpenCL

Dec, 18

Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration

Heterogeneous computing system increases the performance of parallel computing in many domain of general purpose computing with CPU, GPU and other accelerators. With Hardware developments, the software developments like Compute Unified Device Architecture(CUDA) and Open Computing Language (OpenCL) try to offer a simple and visualized tool for parallel computing. But it turn out to be […]

CUDA

•

OpenCL

Dec, 18

Parallelisation of Shallow Water Simulation for Heterogeneous Architectures

This work presents the parallelisation of a shallow water simulation model. Two parallel implementations are developed. One is for a multi-core NUMA architecture, developed in OpenMP. The other one is for a many-core GPU-accelerated architecture and is developed in OpenCL. The parallelisation process is based on an iterative approach, starting off from a naive implementation. […]

CUDA

•

OpenCL

Dec, 15

Performance study of using the Direct Compute API for implementing Support vector machines on GPUs

Today graphics processing units (GPUs) are not only able to generate graphical imaging but also able to expose its multicore architecture to increase computationally heavy general purpose algorithms that can be adapted to the multicore architecture of the GPU. The study conducted in this thesis explores the efficiency of using the general purpose graphics processing […]

CUDA

•

OpenCL

Dec, 12

Towards Domain-specific Computing for Stencil Codes in HPC

High Performance Computing (HPC) systems are nowadays more and more heterogeneous. Different processor types can be found on a single node including accelerators such as Graphics Processing Units (GPUs). To cope with the challenge of programming such complex systems, this work presents a domain-specific approach to automatically generate code tailored to different processor types. Low-level […]

CUDA

•

OpenCL

Dec, 10

Neither More Nor Less: Optimizing Thread-level Parallelism for GPGPUs

General-purpose Graphic processing units (GPGPUs) are at their best in accelerating computation by exploiting abundant thread-level parallelism (TLP) offered by many classes of HPC applications. To facilitate such high TLP, emerging programming models like CUDA and OpenCL allow programmers to create work abstractions in terms of smaller work units, called cooperative thread arrays (CTAs), consisting […]

CUDA

Dec, 8

OpenMP Programming on Intel R Xeon Phi TM Coprocessors: An Early Performance Comparison

The demand for more and more compute power is growing rapidly in many fields of research. Accelerators, like GPUs, are one way to fulfill these requirements, but they often require a laborious rewrite of the application using special programming paradigms like CUDA or OpenCL. The Intel(R) Xeon Phi(TM) coprocessor is based on the Intel(R) Many […]

Dec, 4

Fast Parallel Sorting Algorithms on GPUs

This paper presents a comparative analysis of the three widely used parallel sorting algorithms: OddEven sort, Rank sort and Bitonic sort in terms of sorting rate, sorting time and speed-up on CPU and different GPU architectures. Alongside we have implemented novel parallel algorithm: min-max butterfly network, for finding minimum and maximum in large data sets. […]

OpenCL

Dec, 4

A MPI back-end for the OpenACC accULL. Exploiting OpenACC semantics in Message Passing Clusters

The irruption in the HPC scene of hardware acceletarors has made available unprecedented performance to developers. However, even expert developers may not be ready to exploit the complex hierarchies of these new heterogeneous systems. We need to find a way to leverage the programming effort in these architectures at programming language level, otherwise, developers will […]

OpenCL

Dec, 1

CPUless PCs inside networked control systems

This paper represents results of adavancing our previous WSEAS paper[1] and is aimed to basics for design framework that helps design hard real-time control systems using Unix/Unix like operating systems. This framework is designed while solving research project supported by the Slovak Research and Development Agency under the contract No. VMSP-II-0034-09. This framework contains layer […]

OpenCL

Nov, 27

A Data-Parallel Algorithmic Modelica Extension for Efficient Execution on Multi-Core Platforms

New multi-core CPU and GPU architectures promise high computational power at a low cost if suitable computational algorithms can be developed. However, parallel programming for such architectures is usually non-portable, low-level and error-prone. To make the computational power of new multi-core architectures more easily available to Modelica modelers, we have developed the ParModelica algorithmic language […]

OpenCL

Nov, 26

A compiler toolkit for array-based languages targeting CPU/GPU hybrid systems

This paper presents a compiler toolkit that addresses two important emerging challenges: (1) effectively compiling dynamic array-based languages such as MATLAB, Python and R; and (2) effectively utilizing a wide range of rapidly evolving hybrid CPU/GPU architectures. The toolkit provides: a high-level IR specifically designed to express a wide range of arraybased computations and indexing […]

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

KFusion: Obtaining Modularity and Performance with Regards to General Purpose GPU Computing and Co-processors

Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration

Parallelisation of Shallow Water Simulation for Heterogeneous Architectures

Performance study of using the Direct Compute API for implementing Support vector machines on GPUs

Towards Domain-specific Computing for Stencil Codes in HPC

Neither More Nor Less: Optimizing Thread-level Parallelism for GPGPUs

OpenMP Programming on Intel R Xeon Phi TM Coprocessors: An Early Performance Comparison

Fast Parallel Sorting Algorithms on GPUs

A MPI back-end for the OpenACC accULL. Exploiting OpenACC semantics in Message Passing Clusters

CPUless PCs inside networked control systems

A Data-Parallel Algorithmic Modelica Extension for Efficient Execution on Multi-Core Platforms

A compiler toolkit for array-based languages targeting CPU/GPU hybrid systems

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)