high performance computing on graphics processing units: hgpu.org

Posts

May, 15

High dimensional pricing of exotic European contracts on a GPU Cluster, and comparison to a CPU cluster

The aim of this paper is the efficient use of CPU and GPU clusters for a general path-dependent exotic European pricing, and their comparison in terms of speed and energy consumption. To reach our goal, we propose a parallel random number generator which is well suited to the parallelization paradigm, then, we implement a multidimensional […]

CUDA

May, 15

A parallel Ant Colony Optimization algorithm with GPU-acceleration based on All-In-Roulette selection

Ant Colony Optimization is computationally expensive when it comes to complex problems. The Jacket toolbox allows implementation of MATLAB programs in Graphics Processing Unit (GPU). This paper presents and implements a parallel MAX-MIN Ant System (MMAS) based on a GPU+CPU hardware platform under the MATLAB environment with Jacket toolbox to solve Traveling Salesman Problem (TSP). […]

May, 15

K3 Moore’s Law in the Era of GPU Computing

The history of humanity is that we strive to use better tools and knowledge to build even better tools, and extend further the border of knowledge. In the past 50 years, CPU, as a dominant paradigm for computing, has provided exponential growth as predicted by Moore’s Law with remarkable accuracy. We have been leveraging CPUs […]

May, 15

Object oriented framework for real-time image processing on GPU

In this paper, we present a framework for efficiently integrating programming resources of both GPU and CPU. We introduce an object oriented framework for GPGPU-based image processing. We illustrate a set of classes exploiting the design and programming advantages of an object oriented language, such as code reusability/extensibility, flexibility, information hiding, and complexity hiding. This […]

CUDA

May, 15

Fermi GF100 GPU Architecture

The Fermi GF100 is a GPU architecture that provides several new capabilities beyond the Nvidia GT200 or Tesla architecture. The Fermi architecture offers up to 512 CUDA cores and special features for gaming and high-performance computing. This article describes the GPU’s new capabilities for tessellation, physics processing, and computational graphics.

CUDA

May, 15

Investigating the use of GPU-accelerated nodes for SAR image formation

The computation of an electromagnetic reflectivity image from a set of radar returns is a computationally intensive process. Therefore, the use of high performance computing is required to form images from radar signals in a short time frame. This paper explores the use of distributed memory cluster computers and accelerator technologies such as GPUs for […]

May, 15

A GPU Algorithm for IC Floorplanning: Specification, Analysis and Optimization

In this paper, we propose a novel floor planning algorithm for GPUs. Floor planning is an inherently sequential algorithm, far from the typical programs suitable for Single Instruction Multiple Thread (SIMT) style concurrency in a GPU. We propose a fundamentally different approach of exploring the floor plan solution space, where we evaluate concurrent moves on […]

May, 15

Automated pose estimation in 3D point clouds applying annealing particle filters and inverse kinematics on a GPU

Current experiments with HCIs have shown a high demand for more natural interaction paradigms. Gestures are thereby considered the most important cue besides speech. In order to recognize gestures it is necessary to extract meaningful motion features from the body. Up to now mostly marker based tracking systems are used in virtual reality environments, since […]

CUDA

May, 15

Case Study: GPU-based implementation of sequence pair based floorplanning using CUDA

In this paper, we demonstrate that runtime of VLSI Computer-aided design (CAD) applications can be successfully reduced with parallel programming in conjunction with Graphic Processing Units (GPUs). Particularly, we apply GPU-based computing to the sequence pair based floorplanning algorithm. In addition to reducing runtime, we focus on the minimization of changes in code structure in […]

CUDA

May, 15

Real-time Forest Simulation for a Flight Simulator using a GPU; Graphics Card Acceleration

This paper concerns the real-time simulation of forests for a flight simulator, exploiting the capacities of recent graphics cards. As we will show, these architectures coupled with recent ergonomic environments like CUDA allow C-programmers to implement highly parallelizable algorithms to be executed on GPU, without being specialized in parallel programming. The first results exhibited are […]

CUDA

May, 14

GPU-based accelerated FDTD simulations for double negative (DNG) materials applications

Recently, double-negative meta-materials are widely studied in scientific research. The double-negative (DNG) mediums are characterized by simultaneous negative permittivity and permeability. In order to make the FDTD method analyze the electromagnetic scattering and propagation for double-negative (DNG) medium, z-transform is applied to the FDTD method in the double-negative (DNG) medium. For the simulations, extremely large […]

May, 14

Understanding the efficiency of parallel incomplete Cholesky preconditioners on the performance of ICCG solvers for multi-core and GPU systems

This paper aims at understanding the effect of parallelizing incomplete Cholesky (IC) factorization on the overall performance of the incomplete Cholesky conjugate gradient (ICCG) solver method, optimized on multi-core and GPU based Systems. Parallel IC preconditioners, which are based on graph reordering and arbitrary levels of allowed fill-in, are tested on structured and unstructured matrices […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

High dimensional pricing of exotic European contracts on a GPU Cluster, and comparison to a CPU cluster

A parallel Ant Colony Optimization algorithm with GPU-acceleration based on All-In-Roulette selection

K3 Moore’s Law in the Era of GPU Computing

Object oriented framework for real-time image processing on GPU

Fermi GF100 GPU Architecture

Investigating the use of GPU-accelerated nodes for SAR image formation

A GPU Algorithm for IC Floorplanning: Specification, Analysis and Optimization

Automated pose estimation in 3D point clouds applying annealing particle filters and inverse kinematics on a GPU

Case Study: GPU-based implementation of sequence pair based floorplanning using CUDA

Real-time Forest Simulation for a Flight Simulator using a GPU; Graphics Card Acceleration

GPU-based accelerated FDTD simulations for double negative (DNG) materials applications

Understanding the efficiency of parallel incomplete Cholesky preconditioners on the performance of ICCG solvers for multi-core and GPU systems

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)