high performance computing on graphics processing units: hgpu.org

Posts

Sep, 11

Collision-streams: fast GPU-based collision detection for deformable models

We present a fast GPU-based streaming algorithm to perform collision queries between deformable models. Our approach is based on hierarchical culling and reduces the computation to generating different streams. We present a novel stream registration method to compact the streams and efficiently compute the potentially colliding pairs of primitives. We also use a deferred front […]

CUDA

•

OpenCL

Sep, 11

Texture compression of light maps using smooth profile functions

Light maps have long been a popular technique for visually rich real-time rendering in games. They typically contain smooth color gradients which current low bit rate texture compression techniques, such as DXT1 and ETC2, do not handle well. The application writer must therefore choose between doubling the bit rate by choosing a codec such as […]

OpenCL

Sep, 11

Lattice-boltzmann water waves

A model for real-time generation of deep-water waves is suggested. It is based on a lattice-Boltzmann (LB) technique. Computation of wave dynamics and (ray-traced) rendering for a lattice of size 10242 can be carried out simultaneously on a single graphics card at 25 frames per second. In addition to the computational speed, the LB technique […]

OpenCL

Sep, 11

A design pattern language for engineering (parallel) software: merging the PLPP and OPL projects

Parallel programming is stuck. To make progress, we need to step back and understand the software people wish to engineer. We do this with a design pattern language. This paper provides background for a lively discussion of this pattern language. We present the context for the problem, the layers in the design pattern language, and […]

Sep, 11

Automatic safety proofs for asynchronous memory operations

We present a work-in-progress proof system and tool, based on separation logic, for analysing memory safety of multicore programs that use asynchronous memory operations.

Sep, 11

Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs

Software engineering demands generality and abstraction, performance demands specialization and concretization. Generative programming can provide both, but the effort required to develop high-quality program generators likely offsets their benefits, even if a multi-stage programming language is used. We present lightweight modular staging, a library-based multi-stage programming approach that breaks with the tradition of syntactic quasi-quotation […]

Sep, 11

Early experiences with the intel many integrated cores accelerated computing technology

We report on early programming experiences with the Intel Many Integrated Core (Intel MIC) Co-processor. This new and x86 based technology is Intel’s answer to GPU-based accelerators by NVIDIA, AMD and others. Accelerators have generally sparked interest in the HPC community because they have the potential to significantly increase the compute power of the next […]

Sep, 11

Importance-driven compositing window management

In this paper we present importance-driven compositing window management, which considers windows not only as basic rectangular shapes but also integrates the importance of the windows’ content using a bottom-up visual attention model. Based on this information, importance-driven compositing optimizes the spatial window layout for maximum visibility and interactivity of occluded content in combination with […]

OpenCL

•

OpenGL

Sep, 11

Task Superscalar: An Out-of-Order Task Pipeline

We present emph{Task Super scalar}, an abstraction of instruction-level out-of-order pipeline that operates at the task-level. Like ILP pipelines, which uncover parallelism in a sequential instruction stream, task super scalar uncovers task-level parallelism among tasks generated by a sequential thread. Utilizing intuitive programmer annotations of task inputs and outputs, the task super scalar pipeline dynamically […]

Sep, 11

Parallel programming with NVIDIA CUDA

Using hardware acceleration via General Programming on stock GPUs (GPGPU), I’ve sped up my algorithms by more than tenfold. This article shows how you can achieve these results too! Programmers have been interested in leveraging the highly parallel processing power of video cards to speed up applications that are not graphic in nature for a […]

CUDA

Sep, 11

TimeGraph: GPU scheduling for real-time multi-tasking environments

The Graphics Processing Unit (GPU) is now commonly used for graphics and data-parallel computing. As more and more applications tend to accelerate on the GPU in multi-tasking environments where multiple tasks access the GPU concurrently, operating systems must provide prioritization and isolation capabilities in GPU resource management, particularly in real-time setups. We present TimeGraph, a […]

OpenGL

Sep, 11

Challenges of medical image processing

In todays health care, imaging plays an important role throughout the entire clinical process from diagnostics and treatment planning to surgical procedures and follow up studies. Since most imaging modalities have gone directly digital, with continually increasing resolution, medical image processing has to face the challenges arising from large data volumes. In this paper, we […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Collision-streams: fast GPU-based collision detection for deformable models

Texture compression of light maps using smooth profile functions

Lattice-boltzmann water waves

A design pattern language for engineering (parallel) software: merging the PLPP and OPL projects

Automatic safety proofs for asynchronous memory operations

Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs

Early experiences with the intel many integrated cores accelerated computing technology

Importance-driven compositing window management

Task Superscalar: An Out-of-Order Task Pipeline

Parallel programming with NVIDIA CUDA

TimeGraph: GPU scheduling for real-time multi-tasking environments

Challenges of medical image processing

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)