high performance computing on graphics processing units: hgpu.org

Posts

Aug, 9

OpenCL-based Algorithm for Heat Load Modelling of District Heating System

This paper presents a parallel approach to estimate the parameters in the heat loading of a district heating system by use of the traditional particle swarm optimisation (TPSO) on the Graphic Processing Unit (GPU) using OpenCL. The running time of the algorithm is greatly reduced compared to running on CPU. The heat load is approximated […]

OpenCL

Aug, 9

Kargus: a Highly-scalable Software-based Intrusion Detection System

As high-speed networks are becoming commonplace, it is increasingly challenging to prevent the attack attempts at the edge of the Internet. While many high-performance intrusion detection systems (IDSes) employ dedicated network processors or special memory to meet the demanding performance requirements, it often increases the cost and limits functional flexibility. In contrast, existing softwarebased IDS […]

CUDA

Aug, 9

Optimizing Data Warehousing Applications for GPUs Using Kernel Fusion/Fission

Data warehousing applications represent an emergent application arena that requires the processing of relational queries and computations over massive amounts of data. Modern general purpose GPUs are high core count architectures that potentially offer substantial improvements in throughput for these applications. However, there are significant challenges that arise due to the overheads of data movement […]

CUDA

Aug, 8

Solving the Flexible Job Shop Problem on Multi-GPU

We propose the new framework of the distributed tabu search metaheuristic designed to be executed using a multi-GPU cluster, i.e. cluster of nodes equipped with GPU computing units. We propose a hybrid single-walk parallelization of the tabu search, where hybridization consists in examining a number of solutions from a neighborhood concurrently by several GPUs (multi-GPU). […]

CUDA

Aug, 8

CUDA-Accelerated HD-ODETLAP: Lossy High Dimensional Gridded Data Compression

We present High-dimensional Overdetermined Laplacian Partial Differential Equations (HD-ODETLAP), a high dimensional lossy compression algorithm and CUDA implementation that exploits data correlations across multiple dimensions of gridded GIS data. Exploiting the GPU gives a considerable speedup. In addition, HD-ODETLAP compresses much better than JPEG2000 and 3D-SPIHT, when fixing either the average or the maximum error.

CUDA

Aug, 8

Policy-based Tuning for Performance Portability and Library Co-optimization

Although modular programming is a fundamental software development practice, software reuse within contemporary GPU kernels is uncommon. For GPU software assets to be reusable across problem instances, they must be inherently flexible and tunable. To illustrate, we survey the performance-portability landscape for a suite of common GPU primitives, evaluating thousands of reasonable program variants across […]

CUDA

Aug, 8

Large Scale Finite Element Analysis Using GPU Parallel Computing

In the past years, graphic processing units have become a new abundant parallelcomputing resource on personal computers. In this work parallel computation ofa typical case in nite element analysis for solids has been practiced. The solutionof 3-D linear elastic static problems with 3 degree of freedom is fully implementedutilizing the current GPU technology. Discretization of […]

CUDA

Aug, 8

Using GPU-based Computing To Accelerate Finite Element Problems

Historically Graphics Processing Units (GPU) have been used for offloading graphical visualization and made popular in use for video games, but with the development of NVIDIA’s CUDA architecture and programing language there has been an increase in the use of GPUs in general purpose (GPGPU) programing. Problems involving large systems of linear equations, such as […]

CUDA

Aug, 7

Efficient Algorithms for Sorting on GPUs

Sorting is an important problem in computing that has a rich history of investigation by various researchers. In this thesis we focus on this vital problem. In particular, we develop a novel algorithm for sorting on Graphics Processing Units (GPUs). GPUs are multicore architectures that offer the potential of affordable parallelism. We present an efficient […]

CUDA

Aug, 7

Efficient Monte Carlo sampler for detecting parametric objects in large scenes

Point processes have demonstrated efficiency and competitiveness when addressing object recognition problems in vision. However, simulating these mathematical models is a difficult task, especially on large scenes. Existing samplers suffer from average performances in terms of computation time and stability. We propose a new sampling procedure based on a Monte Carlo formalism. Our algorithm exploits […]

CUDA

Aug, 7

Landau Gauge Fixing on GPUs and String Tension

We explore the performance of CUDA in performing Landau gauge fixing in Lattice SU(3), using the steepest descent method with Fourier acceleration. The code performance was tested in a Tesla C2070, Fermi architecture. We also present a study of the string tension at finite temperature in the confined phase. The string tension is extracted from […]

CUDA

Aug, 7

CuBA – a CUDA implementation of BAMPS

Using CUDA as programming language, we create a code named CuBA which is based on the CPU code "Boltzmann Approach for Many Parton Scattering (BAMPS)" developed in Frankfurt in order to study a system of many colliding particles resulting from heavy ion collisions. Furthermore, we benchmark our code with the Riemann Problem and compare the […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

OpenCL-based Algorithm for Heat Load Modelling of District Heating System

Kargus: a Highly-scalable Software-based Intrusion Detection System

Optimizing Data Warehousing Applications for GPUs Using Kernel Fusion/Fission

Solving the Flexible Job Shop Problem on Multi-GPU

CUDA-Accelerated HD-ODETLAP: Lossy High Dimensional Gridded Data Compression

Policy-based Tuning for Performance Portability and Library Co-optimization

Large Scale Finite Element Analysis Using GPU Parallel Computing

Using GPU-based Computing To Accelerate Finite Element Problems

Efficient Algorithms for Sorting on GPUs

Efficient Monte Carlo sampler for detecting parametric objects in large scenes

Landau Gauge Fixing on GPUs and String Tension

CuBA – a CUDA implementation of BAMPS

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)