high performance computing on graphics processing units: hgpu.org

Posts

Dec, 13

High performance computing for linear acoustic wave simulation

Parallel computing techniques are applied to a linear acoustic wave model to reduce execution time. Three parallel computing models are developed to parallelize computations. The fork-and-join, SPMD and SIMT models define the execution of parallel computations. The precision and efficiency of the linear acoustic wave model are improved through substantial speedups in all implementations. Furthermore, […]

OpenGL

Dec, 13

Divergence Analysis with Affine Constraints

The rise of graphics processing units in high-performance computing is bringing renewed interest in code optimization techniques that target SIMD processors. Many of these optimizations rely on divergence analyses, which classify variables as uniform, if they have the same value on every thread, or divergent, if they might not. This paper introduces a new kind […]

CUDA

Dec, 13

Graphics Processing Units: More Than the Pathway to Realistic Video-Games

The huge video games market has propelled the development of hardware and software focused on making the game environment more realistic. Among such developments are graphics processing units (GPUs). These devices are intended to alleviate the central processing unit (CPU) of the host computer from the computation that creates "life" for video games. The GPUs […]

CUDA

Dec, 12

Design and study of a massively multi threaded shared memory architecture

Most biocomputing problems require a high processing power with high memory needs while showing massive parallelism opportunities. Unfortunately, although advances are made in software parallelism, current architectures do not provide a transparent way to use this parallelism at its full potential. We thus started to design an massively parallel megathreaded architecture that would match biocomputing […]

CUDA

Dec, 12

Parallelization of an Ultrasound Reconstruction Algorithm for non Destructive Testing on Multicore CPU and GPU

The CIVA software platform developed by CEA-LIST offers various simulation and data processing modules dedicated to non-destructive testing (NDT). In particular, ultrasonic imaging and reconstruction tools are proposed, in the purpose of localizing echoes and identifying and sizing the detected defects. Because of the complexity of data processed, computation time is now a limitation for […]

CUDA

Dec, 12

A Rigid Body Physics Engine for Interactive Applications

We have conceived and implemented a software library to be employed in speeding up the development of animations or applications that make use of interactive physics simulation. This paper focuses on the discussion of some of the algorithms and techniques that were used on its basic implementation, and also on expansions and optimizations that were […]

OpenGL

Dec, 12

Parallel Evaluation of a Spatial Traversability Cost Function on GPU for Efficient Path Planning

A parallel version of the traditional grid based cost-to-go function generation algorithm used in robot path planning is introduced. The process takes advantage of the spatial layout of an occupancy grid by concurrently calculating the next wave front of grid cells usually evaluated sequentially in traditional dynamic programming algorithms. The algorithm offers an order of […]

OpenGL

Dec, 12

Accelerating non-linear image registration with GPUs

The alignment or registration of two images or volumetric datasets is frequently a requirement in modern image-processing applications, particularly within the context of medical imaging. Modern graphics-processing units (GPUs) are designed to perform simple 3D graphics-pipeline tasks on a massively parallel scale; this processing power can be harnessed for general computation via libraries such as […]

CUDA

Dec, 12

GPU Programming in a High Level Language: Compiling X10 to CUDA

GPU architectures have emerged as a viable way of considerably improving performance for appropriate applications. Program fragments (kernels) appropriate for GPU execution can be implemented in CUDA or OpenCL and glued into an application via an API. While there is plenty of evidence of performance improvements using this approach, there are many issues with productivity. […]

CUDA

•

OpenCL

Dec, 12

A fast and intuitive visual programming language (VPL) for constructing Computer Vision and Image processing systems on GPUs

In this work we present a novel GPU based Visual Programming Language for Computer Vision and Image Processing systems. Many vision algorithms have been shown to perform better on GPUs. However, one of the current drawbacks is the need for considerable GPU programming expertise. We propose an abstraction over GPU implementation details by providing an […]

CUDA

Dec, 12

Theano: A CPU and GPU Math Compiler in Python

Theano is a compiler for mathematical expressions in Python that combines the convenience of NumPy’s syntax with the speed of optimized native machine language. The user composes mathematical expressions in a high-level description that mimics NumPy’s syntax and semantics, while being statically typed and functional (as opposed to imperative). These expressions allow Theano to provide […]

CUDA

Dec, 12

A Common GPU n-Dimensional Array for Python and C

Currently there are multiple incompatible array/matrix/n-dimensional base object implementations for GPUs. This hinders the sharing of GPU code and causes duplicate development work. This paper proposes and presents a first version of a common GPU n-dimensional array(tensor) named GpuNdArray that works with both CUDA and OpenCL. It will be usable from python, C and possibly […]

CUDA

•

OpenCL

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

High performance computing for linear acoustic wave simulation

Divergence Analysis with Affine Constraints

Graphics Processing Units: More Than the Pathway to Realistic Video-Games

Design and study of a massively multi threaded shared memory architecture

Parallelization of an Ultrasound Reconstruction Algorithm for non Destructive Testing on Multicore CPU and GPU

A Rigid Body Physics Engine for Interactive Applications

Parallel Evaluation of a Spatial Traversability Cost Function on GPU for Efficient Path Planning

Accelerating non-linear image registration with GPUs

GPU Programming in a High Level Language: Compiling X10 to CUDA

A fast and intuitive visual programming language (VPL) for constructing Computer Vision and Image processing systems on GPUs

Theano: A CPU and GPU Math Compiler in Python

A Common GPU n-Dimensional Array for Python and C

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)