high performance computing on graphics processing units: hgpu.org

Posts

Jan, 5

Abundance Estimation Algorithms using NVIDIA CUDA Technology

Spectral unmixing of hyperspectral images is a process by which the constituent’s members of a pixel scene are determined and the fraction of the abundance of the elements is estimated. Several algorithms have been developed in the past in order to obtain abundance estimation from hyperspectral data, however, most of them are characterized by being […]

CUDA

Jan, 4

Efficient mapping of the training of Convolutional Neural Networks to a CUDA-based cluster

We propose a method to parallelize the training of a convolutional neural network by using a CUDA-based cluster. We attain a substantial increase in the performance of the algorithm itself. We research the feasibility of using batch versus online mode training and provide a performance comparison between them. Furthermore, we propose an implementation of an […]

CUDA

Jan, 4

Implementing Parallel SMO to Train SVM on CUDA-Enabled Systems

We implement a Sequential Minimal Optimization type algorithm to solve for the Lagrangian weights of the dual form of the Support Vector Machine problem. Unlike the original SMO algorithm, the modified SMO algorithm uses a first-order variable selection heuristic to avoid explicit computation of the KKT conditions. Parallelism in the algorithm is exposed via a […]

CUDA

Jan, 4

Task and Data Distribution in Hybrid Parallel Systems

This paper describes my work with the Operating Systems and Middleware group for the HPI Research School on "Service-Oriented Systems Engineering". Computer architecture is shifting. The upper levels of the software stack are thus to be adapted in order to benefit from the current and future hardware capabilities. In this paper, we present the Hybrid.Parallel […]

OpenCL

Jan, 4

Toward Real-Time Dense 3d Reconstruction using Stereo Vision

State of the art Structure from Motion algorithms can produce a real-time sparse 3d map of the environment, in a fast, robust and efficient way. However, dense 3d maps would be very useful for accurate Augmented Reality with occlusion management. This project focus on generating accurate dense depth-maps in near real-time from the data provided […]

CUDA

Jan, 4

Automatic SIMD Code Generation

SIMD instructions are common in microprocessors for roughly one and a half decade now. These instructions enable the programmer to simultaneously perform an operation on several values with a single instruction-hence the name: Single Instruction, Multiple Data. The more values can be computed simultaneously the better the speedup. However, SIMD programming is still commonly considered […]

Jan, 4

Analysis of Real-Time Stereo Vision Algorithms On GPU

Dozens of stereo correspondence algorithms whose matching performance has been measured are available, but the trade-off between speed and matching performance of viable realtime stereo has received much less attention. Here, we evaluate five correspondence algorithms(Symmetric Dynamic Programming Stereo, SemiGlobal Matching, simple block matching, Belief Propagation, and its constant space variant) on a GPU using […]

CUDA

Jan, 4

Extending a C-like Language for Portable SIMD Programming

SIMD instructions are common in CPUs for years now. Using these instructions effectively requires not only vectorization of code, but also modifications to the data layout. However, automatic vectorization techniques are often not powerful enough and suffer from restricted scope of applicability; hence, programmers often vectorize their programs manually by using intrinsics: compiler-known functions that […]

Jan, 4

Parallel Implementation of Compressive Sensing Based SAR Imaging with GPU

The paper proposed a new scheme for parallel implementation of compressive sensing based SAR imaging on GPU with Iterative Shrinkage/Thresholding algorithm. To get a faster recovery speed, we modified the existed IST algorithm structure, and realized the fast implementation on GPU. The experiment result shows that parallel computing capabilities of GPU have a significant speedup […]

CUDA

Jan, 4

Evaluating polynomials in several variables and their derivatives on a GPU computing processor

In order to obtain more accurate solutions of polynomial systems with numerical continuation methods we use multiprecision arithmetic. Our goal is to offset the overhead of double double arithmetic accelerating the path trackers and in particular Newton’s method with a general purpose graphics processing unit. In this paper we describe algorithms for the massively parallel […]

CUDA

Jan, 4

Thermal and Athermal Swarms of Self-Propelled Particles

Swarms of self-propelled particles exhibit complex behavior that can arise from simple models, with large changes in swarm behavior resulting from small changes in model parameters. We investigate the steady-state swarms formed by self-propelled Morse particles in three dimensions using molecular dynamics simulations optimized for GPUs. We find a variety of swarms of different overall […]

CUDA

Jan, 4

Decoupled Deferred Shading for Hardware Rasterization

In this paper we present decoupled deferred shading: a rendering technique based on a new data structure called compact geometry buffer, which stores shading samples independently from the visibility. This enables caching and efficient reuse of shading computation, e.g. for stochastic rasterization techniques. In contrast to previous methods, our decoupled shading can be efficiently implemented […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Abundance Estimation Algorithms using NVIDIA CUDA Technology

Efficient mapping of the training of Convolutional Neural Networks to a CUDA-based cluster

Implementing Parallel SMO to Train SVM on CUDA-Enabled Systems

Task and Data Distribution in Hybrid Parallel Systems

Toward Real-Time Dense 3d Reconstruction using Stereo Vision

Automatic SIMD Code Generation

Analysis of Real-Time Stereo Vision Algorithms On GPU

Extending a C-like Language for Portable SIMD Programming

Parallel Implementation of Compressive Sensing Based SAR Imaging with GPU

Evaluating polynomials in several variables and their derivatives on a GPU computing processor

Thermal and Athermal Swarms of Self-Propelled Particles

Decoupled Deferred Shading for Hardware Rasterization

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)