high performance computing on graphics processing units: hgpu.org

Posts

Oct, 3

GPU-accelerated triangle-triangle intersection tester algorithm

The goal of the project is to develop a triangle-triangle collision algorithm. A reference triangle is given as well as a variably-sized array of many other triangles. The algorithm must check if one triangle intersects with the reference triangle. That operation has to be led for each "non-reference" triangle with the reference triangle. If one […]

CUDA

Oct, 3

Compiler Optimizations for SIMD/GPU/Multicore Architectures

In modern computer architectures, both SIMD (single-instruction multiple-data) instruction set extensions and GPUs can be used to accelerate the general purpose applications. In addition, the multicore machines can potentially provide more computation power for high performance computing with increasing number of cores and deeper cache hierarchies. However, writing high-performance codes manually for these architectures is […]

CUDA

Oct, 2

CUDA Enhanced Filtering in a Pipelined Video Processing Framework

The processing of digital video has long been a significant computational task for modern x86 processors. With every video frame composed of one to three planes, each consisting of a two-dimensional array of pixel data, and a video clip comprising of thousands of such frames, the sheer volume of data is significant. With the introduction […]

CUDA

Oct, 2

Parallel Hyperspectral Unmixing on GPUs

This letter presents a new parallel method for hyperspectral unmixing composed by the efficient combination of two popular methods: vertex component analysis (VCA) and sparse unmixing by variable splitting and augmented Lagrangian (SUNSAL). First, VCA extracts the end-member signatures, and then, SUNSAL is used to estimate the abundance fractions. Both techniques are highly parallelizable, which […]

OpenCL

Oct, 2

A state-of-the-art password strength analysis demonstrator

Due to recent developments: leaks of large lists of user passwords (e.g. LinkedIn), new probabilistic password cracking techniques and the introduction of password cracking using GPUs. Passwords can now be cracked faster than ever before. The leaked password lists have been analyzed by hackers and common patterns found inside the passwords are being exploited to […]

OpenCL

Oct, 2

GPU-powered Simulation Methodologies for Biological Systems

The study of biological systems witnessed a pervasive cross-fertilization between experimental investigation and computational methods. This gave rise to the development of new methodologies, able to tackle the complexity of biological systems in a quantitative manner. Computer algorithms allow to faithfully reproduce the dynamics of the corresponding biological system, and, at the price of a […]

CUDA

Oct, 2

A graphics processor-based intranuclear cascade and evaporation simulation

Monte Carlo simulations of the transport of protons in human tissue have been deployed on graphics processing units (GPUs) with impressive results. To provide a more complete treatment of non-elastic nuclear interactions in these simulations, we developed a fast intranuclear cascade-evaporation simulation for the GPU. This can be used to model non-elastic proton collisions on […]

CUDA

Oct, 1

GPU-Accelerated Real-Time Surveillance De-Weathering

A fully automatic de-weathering system to increase the visibility/stability in surveillance applications during bad weather has been developed. Rain, snow and haze during daylight are handled in real-time performance with acceleration from CUDA implemented algorithms. Video from fixed cameras is processed on a PC with no need of special hardware except an NVidia GPU. The […]

CUDA

Oct, 1

Head Pose Tracking Using GPU Based Real-time 3D Registration

The head pose tracking is one of the important criteria for improving the abilities of the human computer interactions and the human robot interactions. With the improvement of low cost consumer depth cameras lot of research attention attracted to the 3D based head pose estimation which is more accurate and robust to the environment conditions. […]

CUDA

Oct, 1

Optimizing RDF stores by coupling General-purpose Graphics Processing Units and Central Processing Units

From our experience in using RDF stores as a backend for social media streams, we pinpoint three shortcomings of current RDF stores in terms of aggregation speed, constraints checking and large-scale reasoning. Parallel algorithms are being proposed to scale reasoning on RDF graphs. However the current efforts focus on the closure computation using High Performance […]

CUDA

Oct, 1

Optimizing Real Time GPU Kernels Using Fuzzy Inference System

CPU technology is slowly reaching its threshold, however Moore’s Law still holds true for GPUs. With the increasing scope for GPGPU computing more and more applications are being ported to the GPU framework. One of the most suited application areas for GPGPU computing is image processing and computer vision. The high performance given by GPUs […]

CUDA

Oct, 1

Template Library for Multi-GPU Pseudorandom Number Recursion-based Generators

The aim of the paper is to show how to design and implement fast parallel algorithms for Linear Congruential, Lagged Fibonacci and Wichmann-Hill pseudorandom number generators. The new algorithms employ the divide-and-conquer approach for solving linear recurrence systems. They are implemented on multi GPU-accelerated systems using CUDA. Numerical experiments performed on a computer system with […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

GPU-accelerated triangle-triangle intersection tester algorithm

Compiler Optimizations for SIMD/GPU/Multicore Architectures

CUDA Enhanced Filtering in a Pipelined Video Processing Framework

Parallel Hyperspectral Unmixing on GPUs

A state-of-the-art password strength analysis demonstrator

GPU-powered Simulation Methodologies for Biological Systems

A graphics processor-based intranuclear cascade and evaporation simulation

GPU-Accelerated Real-Time Surveillance De-Weathering

Head Pose Tracking Using GPU Based Real-time 3D Registration

Optimizing RDF stores by coupling General-purpose Graphics Processing Units and Central Processing Units

Optimizing Real Time GPU Kernels Using Fuzzy Inference System

Template Library for Multi-GPU Pseudorandom Number Recursion-based Generators

Recent source codes

DITRON: Distributed Compiler based on Triton for Parallel Systems

IntelliKit: Agent-first tooling for AMD hardware

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

Agentic Code Optimization via Compiler-LLM Cooperation

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Most viewed papers (last 30 days)