high performance computing on graphics processing units: hgpu.org

Posts

Mar, 30

OpenCL-based design methodology for application-specific processors

OpenCL is a programming language standard which enables the programmer to express the application by structuring its computation as kernels. The OpenCL compiler is given the explicit freedom to parallelize the execution of kernel instances at all the levels of parallelism. In comparison to the traditional C programming language which is sequential in nature, OpenCL […]

OpenCL

Mar, 30

Hybrid OpenCL over high speed networks

We are developing Hybrid OpenCL, which enables the connection between different OpenCL implementations over the network. Hybrid OpenCL consists of two elements, a runtime system that provides the abstraction of different OpenCL implementations and a bridge program that connects multiple OpenCL runtime systems over the network. Hybrid OpenCL enables the construction of the scalable OpenCL […]

OpenCL

Mar, 30

Improving Hybrid OpenCL Performance by High Speed Networks

We developed Hybrid OpenCL, which enables the connection between different OpenCL implementations over the network. Hybrid OpenCL consists of two elements, a runtime system that provides the abstraction of different OpenCL implementations and a bridge program that connects multiple OpenCL runtime systems over the network. Problems in OpenCL are not being able to use different […]

OpenCL

Mar, 30

Hybrid OpenCL: Connecting Different OpenCL Implementations over Network

OpenCL

Mar, 30

PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation

High-performance computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and efficiency in important large-scale applications of computational science. However, exploiting this potential can be challenging, as one must adapt to the specialized and rapidly evolving computing […]

CUDA

•

OpenCL

Mar, 30

A Very Simple Approach for 3-D to 2-D Mapping

Many times we need to plot 3-D functions e.g., in many scientificc experiments. To plot this 3-D functions on 2-D screen it requires some kind of mapping. Though OpenGL, DirectX etc 3-D rendering libraries have made this job very simple, still these libraries come with many complex pre- operations that are simply not intended, also […]

OpenGL

Mar, 30

Augmented reality usage for prototyping speed up

The first part of the article describes our approach for solution of this problem by means of Augmented Reality. The merging of the real world model and digital objects allows streamline the work with the model and speed up the whole production phase significantly. The main advantage of augmented reality is the possibility of direct […]

OpenGL

Mar, 29

CuHMMer: A load-balanced CPU-GPU cooperative bioinformatics application

GPUs have recently been used to accelerate data-parallel applications for they provide easier programmability and increased generality while maintaining the tremendous memory bandwidth and computational power. Most of those applications use CPU as a controller who decides when GPUs run the computing-intensive tasks. This CPU-control-GPU-compute pattern wastes much of CPU’s computational power. In this paper, […]

CUDA

Mar, 29

GPU accelerated statistical image reconstruction for Compton cameras

We propose GPU (graphics processing unit) accelerated methods that can dramatically improve the computational performance of statistical image reconstruction algorithms for Compton cameras. Since the conventional ray-based backprojection method is inefficient for GPU, we develop a fully voxel-based backprojection method which can maximize the performance of GPU. In this method, the cone surface is sampled […]

Mar, 29

Multigrid on GPU: Tackling Power Grid Analysis on parallel SIMT platforms

The challenging task of analyzing on-chip power (ground) distribution networks with multi-million node complexity and beyond is key to todaypsilas large chip designs. For the first time, we show how to exploit recent massively parallel single-instruction multiple-thread (SIMT) based graphics processing unit (GPU) platforms to tackle power grid analysis with promising performance. Several key enablers […]

CUDA

Mar, 29

Fast view synthesis using GPU for 3D display

In this paper, we develop a fast view synthesis method that generates multiple intermediate views in real time for a 3D display system when the camera geometry and the depth map of the reference views are given. The proposed method achieves a faster view synthesis than previous approaches by processing in parallel the entire computations […]

Mar, 29

5.6: GPU enhancement of FDTD-PIC plasma-wave simulations

Simple models of major CPU-intensive MAGIC electromagnetic (EM) plasma code portions using the CUDA language run on the graphical processing unit (GPU) indicate 12x computing rate compared to the same calculations run on the CPU only. MAGIC is being modified for performance speedup of large-scale plasma-wave EM calculations using GPU processing. Results to-date from MAGIC […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

OpenCL-based design methodology for application-specific processors

Hybrid OpenCL over high speed networks

Improving Hybrid OpenCL Performance by High Speed Networks

Hybrid OpenCL: Connecting Different OpenCL Implementations over Network

PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation

A Very Simple Approach for 3-D to 2-D Mapping

Augmented reality usage for prototyping speed up

CuHMMer: A load-balanced CPU-GPU cooperative bioinformatics application

GPU accelerated statistical image reconstruction for Compton cameras

Multigrid on GPU: Tackling Power Grid Analysis on parallel SIMT platforms

Fast view synthesis using GPU for 3D display

5.6: GPU enhancement of FDTD-PIC plasma-wave simulations

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)