high performance computing on graphics processing units: hgpu.org

Posts

Feb, 17

Interactive Design Exploration for Constrained Meshes

In architectural design, surface shapes are commonly subject to geometric constraints imposed by material, fabrication or assembly. Rationalization algorithms can convert a freeform design into a form feasible for production, but often require design modifications that might not comply with the design intent. In addition, they only offer limited support for exploring alternative feasible shapes, […]

CUDA

Feb, 17

Efficient pseudo-random number generation for monte-carlo simulations using graphic processors

A hybrid approach based on the combination of three Tausworthe generators and one linear congruential generator for pseudo random number generation for GPU programing as suggested in NVIDIA-CUDA library has been used for MONTE-CARLO sampling. On each GPU thread, a random seed is generated on fly in a simple way using the quick and dirty […]

CUDA

•

OpenCL

Feb, 17

Resolution of Linear Algebra for the Discrete Logarithm Problem using GPU and Multi-core Architectures

In cryptanalysis, solving the discrete logarithm problem (DLP) is key to assessing the security of many public-key cryptosystems. The index-calculus methods, that attack the DLP in multiplicative subgroups of finite fields, require solving large sparse systems of linear equations modulo large primes. This article deals with how we can run this computation on GPU- and […]

CUDA

Feb, 17

Fast American Basket Option Pricing on a multi-GPU Cluster

This article presents a multi-GPU adaptation of a specific Monte Carlo and classification based method for pricing American basket options, due to Picazo. The first part relates how to combine fine and coarse-grained parallelization to price American basket options. A dynamic strategy of kernel calibration is proposed. Doing so, our implementation on a reasonable size […]

OpenCL

Feb, 16

Towards Porting a Real-World Seismological Application to the Intel MIC Architecture

This whitepaper aims to discuss first experiences with porting an MPI-based real-world geophysical application to the new Intel Many Integrated Core (MIC) architecture. The selected code SeisSol is an application written in Fortran that can be used to simulate earthquake rupture and radiating seismic wave propagation in complex 3-D heterogeneous materials. The PRACE prototype cluster […]

Feb, 16

Direct Numerical Simulation and Large Eddy Simulation on a Turbulent Wall-Bounded Flow Using Lattice Boltzmann Method and Multiple GPUs

Direct numerical simulation (DNS) and large eddy simulation (LES) were performed on the wall-bounded flow at Re_tau = 180 using lattice Boltzmann method (LBM) and multiple Graphic Processing Units (GPUs). In the DNS, 8 K20M GPUs were adopted. The maximum number of meshes is 6.7×10^7, which results in the non-dimensional mesh size of Delta+=1.41 for […]

CUDA

Feb, 16

Cuda K-Nn: application to the segmentation of the retinal vasculature within SD-OCT volumes of mice

In this work, a speed comparison between GPU-based CUDA k-NN implementation and the ANN implementation has been tested on three sets of medical imaging data. The results show that with higher dimensional data, CUDA-based k-NN approach could have up to two orders of magnitude of speed up. Otherwise, ANN would be a better implementation to […]

CUDA

Feb, 16

Application of the Characteristic Basis Function Method using CUDA

The Characteristic Basis Function Method (CBFM) is a popular technique for efficiently solving the Method of Moments (MoM) matrix equations. In this work, we address the adaptation of this method to a relatively new computing infrastructure provided by NVIDIA, the Compute Unified Device Architecture (CUDA), and take into account some of the limitations which appear […]

CUDA

Feb, 16

LDetector: A Low Overhead Race Detector For GPU Programs

Data race detection is an important problem in GPU programming. The paper presents a novel solution. It uses the compiler support to privatize shared data and then at run time parallelizes the race checking. It has two distinct features. First, there is no per access monitoring, so the race detection has a low overhead and […]

CUDA

Feb, 15

ADBIS workshop on GPUs In Databases, GID 2014

High performance of modern Graphics Processing Units may be utilized not only for graphics related application but also for general computing. This computing power has been utilized in new variants of many algorithms from almost every computer science domain. Unfortunately, while other application domains strongly benefit from utilizing the GPUs, databases related applications seem not […]

Feb, 15

High-Performance Graphics 2014

High-Performance Graphics is the leading international forum for performance-oriented graphics and imaging systems research including innovative algorithms, efficient implementations, languages, parallelism, compilers, parallelism, hardware and architectures for high-performance graphics. High-Performance Graphics was founded in 2009 to synthesize and broaden two important and well-respected conferences in computer graphics: Graphics Hardware and Interactive Ray Tracing. The conference […]

Feb, 15

Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths

Finding the shortest paths from a single source to all other vertices is a fundamental method used in a variety of higher-level graph algorithms. We present three parallel-friendly and work-efficient methods to solve this Single-Source Shortest Paths (SSSP) problem: Workfront Sweep, Near-Far and Bucketing. These methods choose different approaches to balance the tradeoff between saving […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Interactive Design Exploration for Constrained Meshes

Efficient pseudo-random number generation for monte-carlo simulations using graphic processors

Resolution of Linear Algebra for the Discrete Logarithm Problem using GPU and Multi-core Architectures

Fast American Basket Option Pricing on a multi-GPU Cluster

Towards Porting a Real-World Seismological Application to the Intel MIC Architecture

Direct Numerical Simulation and Large Eddy Simulation on a Turbulent Wall-Bounded Flow Using Lattice Boltzmann Method and Multiple GPUs

Cuda K-Nn: application to the segmentation of the retinal vasculature within SD-OCT volumes of mice

Application of the Characteristic Basis Function Method using CUDA

LDetector: A Low Overhead Race Detector For GPU Programs

ADBIS workshop on GPUs In Databases, GID 2014

High-Performance Graphics 2014

Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths

Recent source codes

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

Agentic Code Optimization via Compiler-LLM Cooperation

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Most viewed papers (last 30 days)