high performance computing on graphics processing units: hgpu.org

Posts

Sep, 24

Functional Signal Processing with Pure and Faust Using the LLVM Toolkit

Pure and Faust are two functional programming languages useful for programming computer music and other multimedia applications. Faust is a domain-specific language specifically designed for synchronous signal processing, while Pure is a general-purpose language which aims to facilitate symbolic processing of complicated data structures in a variety of application areas. Pure is based on the […]

Sep, 24

Dynamic Data Structures for Taskgraph Scheduling Policies with Applications in OpenCL Accelerators

OpenCL is an emerging open framework for parallel programming in heterogenous systems. OpenCL accelerators need to schedule the execution of submitted jobs with no (or only very imprecise) estimates of execution times, but respecting dependencies among them, which are given in the form of directed acyclic graph. This problem is known as stochastic taskgraph scheduling, […]

OpenCL

Sep, 24

CU2CL: A CUDA-to-OpenCL Translator for Multi-and Many-core Architectures

The use of graphics processing units (GPUs) in high-performance parallel computing continues to become more prevalent, often as part of a heterogeneous system. For years, CUDA has been the de facto programming environment for nearly all general-purpose GPU (GPGPU) applications. In spite of this, the framework is available only on NVIDIA GPUs, traditionally requiring reimplementation […]

CUDA

•

OpenCL

Sep, 24

An MDE Approach for Automatic Code Generation from MARTE to OpenCL

Advanced engineering and scientific communities have used parallel programming to solve their large scale complex problems. Achieving high performance is the main advantage for this choice. However, as parallel programming requires a non-trivial distribution of tasks and data, developers find it hard to implement their applications effectively. Thus, in order to reduce design complexity, we […]

OpenCL

Sep, 23

Integrated Framework for Heterogeneous Embedded Platforms Using OpenCL

The technology community is rapidly moving away from the age of computers and laptops, and is entering the emerging era of hand-held devices. With the rapid development of smart phones, tablets, and pads, there has been widespread adoption of Graphic Processing Units (GPUs) in the embedded space. The hand-held market is now seeing an ever […]

OpenCL

Sep, 23

Embedding OpenCL in C++ for Expressive GPU Programming

We present a high performance GPU programming language, based on OpenCL, that is embedded in C++. Our embedding provides shared data structures, typesafe kernel invocation, and the ability to more naturally interleave CPU and GPU functions, similar to CUDA but with the portability of OpenCL. For expressivity, our language provides an abstraction that releases control […]

OpenCL

•

OpenGL

Sep, 23

Improving SIMT Efficiency of Global Rendering Algorithms with Architectural Support for Dynamic Micro-Kernels

Wide Single Instruction, Multiple Thread (SIMT)architectures often require a static allocation of thread groups that are executed in lockstep throughout the entire application kernel. Individual thread branching is supported by executing all control flow paths for threads in a thread group and only committing the results of threads on the current control path. While convergence […]

CUDA

Sep, 23

Accelerating reaction-diffusion simulations with general-purpose graphics processing units

SUMMARY: We present a massively parallel stochastic simulation algorithm (SSA) for reaction-diffusion systems implemented on Graphics Processing Units (GPUs). These are designated chips optimized to process a high number of floating point operations in parallel, rendering them well-suited for a range of scientific high-performance computations. Newer GPU generations provide a high-level programming interface which turns […]

CUDA

Sep, 23

Parallel processing on NVIDIA graphics processing units using CUDA

This paper is an introduction to general-purpose computing on graphics processing units. This involves taking advantage of the parallel processing power of modern graphics cards to do general purpose computation. The CUDA architecture used for general purpose computations on NVIDIA graphics cards is described, and important features affecting the run times of CUDA programs are […]

CUDA

Sep, 23

Functional and dynamic programming in the design of parallel prefix networks

A parallel prefix network of width n takes n inputs, a_1, a_2, … , a_n, and computes each yi = a_1 o a_2 o … o a_i for 1 <= i <= n, for an associative operator o. This is one of the fundamental problems in computer science, because it gives insight into how parallel […]

Sep, 23

Image super-resolution by vectorizing edges

As the resolution of output device increases, the demand of high resolution contents has become more eagerly. Therefore, the image superresolution algorithms become more important. In digital image, the edges in the image are related to human perception heavily. Because of this, most recent research topics tend to enhance the image edges to achieve better […]

Sep, 23

Acceleration of Functional Validation Using GPGPU

Logic simulation of a VLSI chip is a computationally intensive process. There exists an urgent need to map functional validation algorithms onto parallel architectures to aid hardware designers in meeting time-to-market constraints. In this paper, we propose three novel methods for logic simulation of combinational circuits on GPGPUs. Initial experiments run on two methods using […]

high performance computing on graphics processing units: hgpu.org

Posts

Functional Signal Processing with Pure and Faust Using the LLVM Toolkit

Dynamic Data Structures for Taskgraph Scheduling Policies with Applications in OpenCL Accelerators

CU2CL: A CUDA-to-OpenCL Translator for Multi-and Many-core Architectures

An MDE Approach for Automatic Code Generation from MARTE to OpenCL

Integrated Framework for Heterogeneous Embedded Platforms Using OpenCL

Embedding OpenCL in C++ for Expressive GPU Programming

Improving SIMT Efficiency of Global Rendering Algorithms with Architectural Support for Dynamic Micro-Kernels

Accelerating reaction-diffusion simulations with general-purpose graphics processing units

Parallel processing on NVIDIA graphics processing units using CUDA

Functional and dynamic programming in the design of parallel prefix networks

Image super-resolution by vectorizing edges

Acceleration of Functional Validation Using GPGPU

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)