high performance computing on graphics processing units: hgpu.org

Posts

Jun, 23

Using JavaScript and WebCL for Numerical Computations: A Comparative Study of Native and Web Technologies

From its modest beginnings as a tool to validate forms, JavaScript is now an industrial-strength language used to power online applications such as spreadsheets, IDEs, image editors and even 3D games. Since all modern web browsers support JavaScript, it provides a medium that is both easy to distribute for developers and easy to access for […]

OpenCL

Jun, 23

Feature Generation for Quantification of Visual Similarity

The complex nature of visual similarity makes it extremely difficult to hand code a set of good features that incorporate all of the important aspects for all images. This thesis work shows that machine learning techniques can be used to generate statistically optimal low dimensional features that work well with calculating similarity using Euclidean distance […]

CUDA

Jun, 23

Accelerating Band Linear Algebra Operations on GPUs with Application in Model Reduction

In this paper we present new hybrid CPU-GPU routines to accelerate the solution of linear systems, with band coefficient matrix, by off-loading the major part of the computations to the GPU and leveraging highly tuned implementations of the BLAS for the graphics processor. Our experiments with an nVidia S2070 GPU report speed-ups up to 6x […]

CUDA

Jun, 23

Efficient GPU-based Training of Recurrent Neural Network Language Models Using Spliced Sentence Bunch

Recurrent neural network language models (RNNLMs) are becoming increasingly popular for a range of applications including speech recognition. However, an important issue that limits the quantity of data, and hence their possible application areas, is the computational cost in training. A standard approach to handle this problem is to use class-based outputs, allowing systems to […]

CUDA

Jun, 23

An efficient parallel algorithm for accelerating computational protein design

MOTIVATION: Structure-based computational protein design (SCPR) is an important topic in protein engineering. Under the assumption of a rigid backbone and a finite set of discrete conformations of side-chains, various methods have been proposed to address this problem. A popular method is to combine the dead-end elimination (DEE) and A* tree search algorithms, which provably […]

CUDA

Jun, 23

Runtime Visualization of Application Progress and Monitoring of a GPU-enabled Parallel Environment

The paper presents design, implementation and real life uses of a visualization subsystem for a distributed framework for parallelization of work-flow-based computations among clusters with nodes that feature both CPUs and GPUs. Firstly, the proposed system presents a graphical view of the infrastructure with clusters, nodes and compute devices along with parameters and runtime graphs […]

OpenCL

Jun, 23

A Scala Prototype to Generate Multigrid Solver Implementations for Different Problems and Target Multi-Core Platforms

Many problems in computational science and engineering involve partial differential equations and thus require the numerical solution of large, sparse (non)linear systems of equations. Multigrid is known to be one of the most efficient methods for this purpose. However, the concrete multigrid algorithm and its implementation highly depend on the underlying problem and hardware. Therefore, […]

CUDA

Jun, 23

Coupled Vlasov and two-fluid codes on GPUs

We present a way to combine Vlasov and two-fluid codes for the simulation of a collisionless plasma in large domains while keeping full information of the velocity distribution in localized areas of interest. This is made possible by solving the full Vlasov equation in one region while the remaining area is treated by a 5-moment […]

CUDA

Jun, 22

The Power-Performance Tradeoffs of the Intel Xeon Phi on HPC Applications

Accelerators are used in about 13% of the current Top500 List. Supercomputers leveraging accelerators grew by a factor of 2.2x in 2012 and are expected to completely dominate the Top500 by 2015. Though most of these deployments use NVIDIA GPGPU accelerators, Intel’s Xeon Phi architecture will likely grow in popularity in the coming years. Unfortunately, […]

Jun, 22

The Fast and Wideband MoM Based on GPU and Two-Path AFS Acceleration

In this paper, a General Purpose Unit (GPU) accelerated full-wave method of moment (MoM) is combined with a two-path adaptive frequency sampling (AFS) approach to analyze the wideband characteristic of the body-wire structures. An equivalent principle is employed to treat the wire as surface so that the model which is analyzed based on the electric-field […]

CUDA

Jun, 22

Solving the Caputo Fractional Reaction-Diffusion Equation on GPU

We present a parallel GPU solution of the Caputo fractional reaction-diffusion equation in one spatial dimension with explicit finite difference approximation. The parallel solution, which is implemented with CUDA programming model, consists of three procedures: preprocessing, parallel solver, and postprocessing. The parallel solver involves the parallel tridiagonal matrix vector multiplication, vector-vector addition, and constant vector […]

CUDA

Jun, 22

Real-Time Deformation of Subdivision Surfaces from Object Collisions

We present a novel real-time approach for fine-scale surface deformations resulting from collisions. Deformations are represented by a high-resolution displacement function. When two objects collide, these offsets are updated directly on the GPU based on a dynamically generated binary voxelization of the overlap region. Consequently, we can handle collisions with arbitrary animated geometry. Our approach […]

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Using JavaScript and WebCL for Numerical Computations: A Comparative Study of Native and Web Technologies

Feature Generation for Quantification of Visual Similarity

Accelerating Band Linear Algebra Operations on GPUs with Application in Model Reduction

Efficient GPU-based Training of Recurrent Neural Network Language Models Using Spliced Sentence Bunch

An efficient parallel algorithm for accelerating computational protein design

Runtime Visualization of Application Progress and Monitoring of a GPU-enabled Parallel Environment

A Scala Prototype to Generate Multigrid Solver Implementations for Different Problems and Target Multi-Core Platforms

Coupled Vlasov and two-fluid codes on GPUs

The Power-Performance Tradeoffs of the Intel Xeon Phi on HPC Applications

The Fast and Wideband MoM Based on GPU and Two-Path AFS Acceleration

Solving the Caputo Fractional Reaction-Diffusion Equation on GPU

Real-Time Deformation of Subdivision Surfaces from Object Collisions

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)