high performance computing on graphics processing units: hgpu.org

Posts

Sep, 11

A data parallel view on polyhedral process networks

Emerging architectures in embedded space are expected to make use of a diverse mix of multicores, vector-based units, GPU cores and special function accelerators. In order to facilitate mapping onto diverse architectures, different models of computation have been considered. Polyhedral Process Networks (PPNs) have been extensively used in automatic generation of task and pipeline parallel […]

Sep, 11

High-performance SIMT code generation in an active visual effects library

SIMT (Single-Instruction Multiple-Thread) is an emerging programming paradigm for high-performance computational accelerators, pioneered in current and next generation GPUs and hybrid CPUs. We present a domain-specific active-library supported approach to SIMT code generation and optimisation in the field of visual effects. Our approach uses high-level metadata and runtime context to guide and to ensure the […]

CUDA

Sep, 11

Software-based branch predication for AMD GPUs

Branch predication is a program transformation technique that combines instructions of multiple branches of an if statement into a straight-line sequence and associates each instruction of the sequence with a predicate. The branch predication improves the execution of branch statements on processors that support predicated execution of instruction, e.g., Intel IA-64, because such transformation improves […]

Sep, 11

Solving diffractive optics problems using graphics processing units

Techniques for applying graphics processing units (GPU) to the general-purpose nongraphics computations proposed in recent years by the companies ATI (AMD FireStream, 2006) and NVIDIA (CUDA: Compute Unified Device Architecture, 2007) have given an impetus to developing algorithms and software packages for solving problems of diffractive optics with the aid of the GPU. The computations […]

CUDA

•

OpenGL

Sep, 9

Enabling multiple accelerator acceleration for Java/OpenMP

While using a single GPU is fairly easy, using multiple CPUs and GPUs potentially distributed over multiple machines is hard because data needs to be kept consistent using message exchange and the load needs to be balanced. We propose (1) an array package that provides partitioned and replicated arrays and (2) a compute-device library to […]

CUDA

•

OpenCL

Sep, 9

Heterogeneous multicore parallel programming for graphics processing units

Hybrid parallel multicore architectures based on graphics processing units (GPUs) can provide tremendous computing power. Current NVIDIA and AMD Graphics Product Group hardware display a peak performance of hundreds of gigaflops. However, exploiting GPUs from existing applications is a difficult task that requires non-portable rewriting of the code. In this paper, we present HMPP, a […]

Sep, 9

Beyond programmable shading (parts I and II)

There are strong indications that the future of interactive graphics programming is a more flexible model than today’s OpenGL/Direct3D pipelines. Graphics developers need a basic understanding of how to combine emerging parallel programming techniques and more flexible graphics processors with the traditional interactive rendering pipeline. As the first in a series, this course introduces the […]

OpenGL

Sep, 9

Data classification for artificial intelligence construct training to aid in network incident identification using network telescope data

This paper considers the complexities involved in obtaining training data for use by artificial intelligence constructs to identify potential network incidents using passive network telescope data. While a large amount of data obtained from network telescopes exists, this data is not currently marked for known incidents. Problems related to this marking process include the accuracy […]

Sep, 9

A stream-computing extension to OpenMP

This paper introduces an extension to OpenMP3.0 enabling stream programming with minimal, incremental additions that seamlessly integrate into the current specification. The stream programming model decomposes programs into tasks and explicits the flow of data among them, thus exposing data, task and pipeline parallelism. It helps the programmers to express concurrency and data locality properties, […]

Sep, 9

CUDACS: securing the cloud with CUDA-enabled secure virtualization

While on the one hand unresolved security issues pose a barrier to the widespread adoption of cloud computing technologies, on the other hand the computing capabilities of even commodity HW are boosting, in particular thanks to the adoption of *-core technologies. For instance, the Nvidia Compute Unified Device Architecture (CUDA) technology is increasingly available on […]

CUDA

Sep, 9

KAdvice: infering synchronization patterns from an existing codebase

Operating system kernels are complex software systems. The kernels of todays mainstream OSs, such as Linux or Windows, are composed from a number of modules, which contain code and data. Even when providing synchronous interfaces (APIs) to the programmer, large portions of the OS kernel operate in an asynchronous manner. Synchronizing access to kernel data […]

Sep, 9

Attaining system performance points: revisiting the end-to-end argument in system design for heterogeneous many-core systems

Trends indicate a rapid increase in the number of cores on chip, exhibiting various types of performance and functional asymmetries present in hardware to gain scalability with balanced power vs. performance requirements. This poses new challenges in platform resource management, which are further exacerbated by the need for runtime power budgeting and by the increased […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

A data parallel view on polyhedral process networks

High-performance SIMT code generation in an active visual effects library

Software-based branch predication for AMD GPUs

Solving diffractive optics problems using graphics processing units

Enabling multiple accelerator acceleration for Java/OpenMP

Heterogeneous multicore parallel programming for graphics processing units

Beyond programmable shading (parts I and II)

Data classification for artificial intelligence construct training to aid in network incident identification using network telescope data

A stream-computing extension to OpenMP

CUDACS: securing the cloud with CUDA-enabled secure virtualization

KAdvice: infering synchronization patterns from an existing codebase

Attaining system performance points: revisiting the end-to-end argument in system design for heterogeneous many-core systems

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)