high performance computing on graphics processing units: hgpu.org

Posts

Aug, 15

Humanoid navigation planning using future perceptive capability

We present an approach to navigation planning for humanoid robots that aims to ensure reliable execution by augmenting the planning process to reason about the robotpsilas ability to successfully perceive its environment during operation. By efficiently simulating the robotpsilas perception system during search, our planner generates a metric, the so-called perceptive capability, that quantifies the […]

Aug, 15

Efficient Calculation of Pairwise Nonbonded Forces

A major bottleneck in molecular dynamics (MD) simulations is the calculation of the pair wise nonbonded interactions. Previous work on FPGAs has shown that these calculations can be implemented with a number of force computation pipelines operating in parallel (4 and 8 for the Stratix-III and Stratix-V, respectively). Optimization has received some attention previously in […]

Aug, 15

FPGA acceleration of rigid-molecule docking codes

Modelling the interactions of biological molecules, or docking, is critical both to understanding basic life processes and to designing new drugs. The field programmable gate array (FPGA) based acceleration of a recently developed, complex, production docking code is described. The authors found that it is necessary to extend their previous three-dimensional (3D) correlation structure in […]

CUDA

Aug, 15

Markerless View-Independent Registration of Multiple Distorted Projectors on Extruded Surfaces Using an Uncalibrated Camera

In this paper, we present the first algorithm to geometrically register multiple projectors in a view-independent manner (i.e. wallpapered) on a common type of curved surface, vertically extruded surface, using an uncalibrated camera without attaching any obtrusive markers to the display screen. Further, it can also tolerate large non-linear geometric distortions in the projectors as […]

Aug, 15

Robust non-local denoising of colored depth data

We give a brief discussion of denoising algorithms for depth data and introduce a novel technique based on the NL-means filter. A unified approach is presented that removes outliers from depth data and accordingly achieves an unbiased smoothing result. This robust denoising algorithm takes intra-patch similarity and optional color information into account in order to […]

CUDA

Aug, 15

Parallel Morphological Endmember Extraction Using Commodity Graphics Hardware

Spatial/spectral algorithms have been shown in previous work to be a promising approach to the problem of extracting image end members from remotely sensed hyperspectral data. Such algorithms map nicely on high-performance systems such as massively parallel clusters and networks of computers. Unfortunately, these systems are generally expensive and difficult to adapt to onboard data […]

Aug, 15

A novel FPGA-based SVM classifier

Support Vector Machines (SVMs) are a powerful supervised learning tool, providing state-of-the-art accuracy at a cost of high computational complexity. The SVM classification suffers from linear dependencies on the number of the Support Vectors and the problem’s dimensionality. In this work, we propose a scalable FPGA architecture for the acceleration of SVM classification, which exploits […]

Aug, 15

Equalizer: A Scalable Parallel Rendering Framework

Continuing improvements in CPU and GPU performances as well as increasing multi-core processor and cluster-based parallelism demand for flexible and scalable parallel rendering solutions that can exploit multipipe hardware accelerated graphics. In fact, to achieve interactive visualization, scalable rendering systems are essential to cope with the rapid growth of data sets. However, parallel rendering systems […]

OpenGL

Aug, 15

Volumetric Ambient Occlusion

This paper presents a new GPU-based algorithm to compute ambient occlusion. We first examine how ambient occlusion is related to the physically founded rendering equation. The correspondence is made by introducing a fuzzy membership function that defines what “near occlusions” mean. Then we develop a method to calculate ambient occlusion in real-time without any pre-computation. […]

Aug, 15

Fast parallel simulation of fiber optical communication systems accelerated by a graphics processing unit

A parallel implementation of the split-step Fourier method utilizing the general purpose parallel computing architecture for graphics processing units CUDA is presented. Results of the GPU-implementation are compared to a conventional CPU-based approach regarding computation time and accuracy. We developed a novel implementation with a significantly higher accuracy than the CUDA intrinsic FFT in single […]

CUDA

Aug, 12

Calculation of fermion loops for eta-prime and nucleon scalar and electromagnetic form factors

The exact evaluation of the disconnected diagram contributions to the flavor-singlet pseudoscalar meson mass, the nucleon sigma term and the nucleon electromagnetic form factors, is carried out utilizing GPGPU technology with the NVIDIA CUDA platform. The disconnected loops are also computed using stochastic methods with several noise reduction techniques. Various dilution schemes as well as […]

CUDA

Aug, 12

Using the physics-based rendering toolkit for medical reconstruction

In this paper we cast the problem of tomography in the realm of computer graphics. By using PBRT (physically based rendering toolkit) we create a scripting environment that simplifies the programming of tomography algorithms such as maximum-likelihood expectation maximization (ML-EM) or simultaneous algebraic reconstruction technique (SART, a deviant of ART). This allows the rapid development […]

OpenGL

high performance computing on graphics processing units: hgpu.org

Posts

Humanoid navigation planning using future perceptive capability

Efficient Calculation of Pairwise Nonbonded Forces

FPGA acceleration of rigid-molecule docking codes

Markerless View-Independent Registration of Multiple Distorted Projectors on Extruded Surfaces Using an Uncalibrated Camera

Robust non-local denoising of colored depth data

Parallel Morphological Endmember Extraction Using Commodity Graphics Hardware

A novel FPGA-based SVM classifier

Equalizer: A Scalable Parallel Rendering Framework

Volumetric Ambient Occlusion

Fast parallel simulation of fiber optical communication systems accelerated by a graphics processing unit

Calculation of fermion loops for eta-prime and nucleon scalar and electromagnetic form factors

Using the physics-based rendering toolkit for medical reconstruction

Recent source codes

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

LC Framework

pplx-garden: Perplexity open source garden for inference technology

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

OpScanner

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Most viewed papers (last 30 days)