high performance computing on graphics processing units: hgpu.org

Posts

Apr, 23

GPU-S2S: A Compiler for Source-to-Source Translation on GPU

CUDA facilitates the development of General Purpose computing on Graphics Processing Units (GPGPU), however, its complex memory system, thread-level structure, and data transmission control between memories have brought great challenges for programming on GPU. In order to facilitate the development of parallel programs on GPU and reuse existing sequential codes, in this paper we propose […]

CUDA

Apr, 23

Multimodal collaboration and human-computer interaction

The research effort at Microsoft research on multimodal collaboration and human-computer interaction aims at developing tools that allow people across geographically distributed sites to interact collaboratively with immersive experience. Our prototype systems consist of cameras, displays, speakers, microphones, computer controllable lights, and/or input devices such as touch sensitive surface, stylus, keyboard, and mouse. They require […]

Apr, 23

Physically-Based Interactive Flow Visualization Based on Schlieren and Interferometry Experimental Techniques

Understanding fluid flow is a difficult problem and of increasing importance as computational fluid dynamics produces an abundance of simulation data. Experimental flow analysis has employed techniques such as shadowgraph, interferometry and schlieren imaging for centuries which allow empirical observation of inhomogeneous flows. Shadowgraphs provide an intuitive way of looking at small changes in flow […]

CUDA

Apr, 23

OpenMPC: Extended OpenMP Programming and Tuning for GPUs

General-Purpose Graphics Processing Units (GPGPUs) are promising parallel platforms for high performance computing. The CUDA (Compute Unified Device Architecture) programming model provides improved programmability for general computing on GPGPUs. However, its unique execution model and memory model still pose significant challenges for developers of efficient GPGPU code. This paper proposes a new programming interface, called […]

CUDA

Apr, 23

Introduction to GPU Computing and CUDA Programming: A Case Study on FDTD [EM Programmer’s Notebook]

The recent advent of general-purpose graphics-processing units (GPGPUs) as inexpensive arithmetic-processing units brings a relevant amount of computing power to modern desktop PCs. This thus providing an interesting pathway to the acceleration of several numerical electromagnetic methods. In this paper, we explain how to exploit GPGPU features by examining how the computational time of the […]

CUDA

Apr, 23

Accelerating Multi-Sensor Image Fusion Using Graphics Hardware

This paper shows approaches to accelerate pixel-level image fusion speed using graphics hardware. Recently, to improve visibility through maximization of information collected through development of various sensors and improvement of sensing technology, the importance of not only development of new fusion algorithm but speed of fusion process is increasing. Though specialized fusion boards for real […]

CUDA

Apr, 23

Design optimization of automotive electronic control unit using the analysis of common-mode current by fast electromagnetic field solver

In this paper, we propose an optimization system based on the fast electromagnetic field solver and metaheuristics for reducing electromagnetic interference (EMI) on electronic control unit (ECU). We adopt simulated annealing (SA), genetic algorithm (GA) and taboo search (TS) to seek optimal solutions, and the finite difference time domain (FDTD) method with general purpose computing […]

Apr, 23

Using parallel GPU architecture for simulation of planar I/F networks

Our work describes the simulation of a planar network of spiking I/F neurons on graphics processing hardware. The described approach adds to the fast-growing field of general-purpose computation on GPUs (GPGPU). We provide an in-depth explanation of the steps involved in implementing the network using programmable shading hardware. We replicated simulation results by Hopfield et […]

Apr, 22

Scalable software defined receivers running on desktop computers using General Purpose Graphics Processing Units

Software defined radios (SDRs) are increasingly attractive to replace common hardware solutions. Using desktop computers for executing software defined radio receivers in real time is not common or even achievable due to the huge amount of processing resources required. In this paper we present some measurement results using an already available general purpose graphics processing […]

Apr, 22

High-Speed Private Information Retrieval Computation on GPU

A Private Information Retrieval (PIR) scheme is a protocol in which a user retrieves a record out of n from a replicated database, while hiding from the database which record has been retrieved, as long as the different replicas do not collude. A specially interesting sub-field of research, called single-database PIR, deals with the schemes […]

CUDA

Apr, 22

Challenges of mapping financial analytics to many-core architecture

Summary form only given. In the past 20 years there has been an explosive growth of the variety of traded financial instruments, from European and American options to a more complex, alas ill-fated, credit derivatives. The rapid increase in computational power coupled with the use of mathematical tools for valuing these instruments and estimating the […]

Apr, 22

MITHRA: Multiple data independent tasks on a heterogeneous resource architecture

With the advent of high-performance COTS clusters, there is a need for a simple, scalable and fault-tolerant parallel programming and execution paradigm. In this paper, we show that the popular MapReduce programming model can be utilized to solve many interesting scientific simulation problems with much higher performance than regular cluster computers by leveraging GPGPU accelerators […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

GPU-S2S: A Compiler for Source-to-Source Translation on GPU

Multimodal collaboration and human-computer interaction

Physically-Based Interactive Flow Visualization Based on Schlieren and Interferometry Experimental Techniques

OpenMPC: Extended OpenMP Programming and Tuning for GPUs

Introduction to GPU Computing and CUDA Programming: A Case Study on FDTD [EM Programmer’s Notebook]

Accelerating Multi-Sensor Image Fusion Using Graphics Hardware

Design optimization of automotive electronic control unit using the analysis of common-mode current by fast electromagnetic field solver

Using parallel GPU architecture for simulation of planar I/F networks

Scalable software defined receivers running on desktop computers using General Purpose Graphics Processing Units

High-Speed Private Information Retrieval Computation on GPU

Challenges of mapping financial analytics to many-core architecture

MITHRA: Multiple data independent tasks on a heterogeneous resource architecture

Recent source codes

DITRON: Distributed Compiler based on Triton for Parallel Systems

IntelliKit: Agent-first tooling for AMD hardware

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

Agentic Code Optimization via Compiler-LLM Cooperation

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Most viewed papers (last 30 days)