high performance computing on graphics processing units: hgpu.org

Posts

Oct, 18

Efficient 2D Software Rendering

The market of computer graphics is dominated by GPU based technologies. However today’s fast central processing units (CPU) based on modern architectural design offer new opportunities in the field of classical software rendering. Because the technological development of the GPU architecture almost reached the limits in the field of the programming model, the CPU-based solutions […]

OpenGL

Oct, 18

GPU-based Batched Spatial Query Processing on R-Trees

R-trees are popular spatial indexing techniques that have been widely used in many geospatial applications. The increasingly available Graphics Processing Units (GPUs) resources for general computing have attracted considerable research interests in applying the massive data parallel technologies to index and query geospatial data based on R-trees. In this paper, we investigate on the potentials […]

CUDA

Oct, 17

Platform-independent parallelization of the Lattice Boltzmann method with OpenCL

Simulations, like fluid dynamics, are very computationally intensive problems. Since the Lattice Boltzmann method uses a discrete grid of cells for simulating the flow, there are no dependencies between the single cells during the computation for one time step. Therefore, the computing can easily be done in parallel. During the last years, multi-CPU computers have […]

OpenCL

Oct, 17

Design and Performance Evaluation of a Software Framework for Multi-Physics Simulations on Heterogeneous Supercomputers

Despite the experience of several decades the numerical simulation of computational fluid dynamics is still an enormously challenging and active research field. Most simulation tasks of scientific and industrial relevance require the modeling of multiple physical effects, complex numerical algorithms, and have to be executed on supercomputers due to their high computational demands. Facing these […]

CUDA

Oct, 17

Enhancing Productivity and Performance Portability of General-Purpose Parallel Programming

This work focuses on compiler and run-time techniques for improving the productivity and the performance portability of general-purpose parallel programming. More specifically, we focus on shared-memory task-parallel languages, where the programmer explicitly exposes parallelism in the form of short tasks that may outnumber the cores by orders of magnitude. The compiler, the run-time, and the […]

Oct, 17

Generalized Resource Allocation for the Cloud

Resource allocation is an integral, evolving part of many data center management problems such as virtual machine placement in data centers, network virtualization, and multi-path network routing. Since the problems are inherently NP-Hard, most existing systems use custom-designed heuristics to find a suitable solution. However, such heuristics are often rigid, making it difficult to extend […]

OpenCL

Oct, 17

Performance Analysis Cluster and GPU Computing Environment on Molecular Dynamic Simulation of BRV-1 and REM2 with GROMACS

One of application that needs high performance computing resources is molecular d ynamic. There is some software available that perform molecular dynamic, one of these is a well known GROMACS. Our previous experiment simulating molecular dynamics of Indonesian grown herbal compounds show sufficient speed up on 32 n odes Cluster computing environment. In order to […]

CUDA

Oct, 16

Accelerating Fully Homomorphic Encryption on GPUs

In a major breakthrough, in 2009 Gentry introduced the first plausible construction of a fully homomorphic encryption (FHE) scheme. FHE allows the evaluation of arbitrary functions directly on encrypted data on untwisted servers. In 2010, Gentry and Halevi presented the first FHE implementation on an IBM x3500 server. However, this implementation remains impractical due to […]

CUDA

Oct, 16

Optimal structure of face detection algorithm using GPU architecture

This article describes parallel algorithm of face detection on images for GPU architecture. This algorithm is an extension of an algorithm from OpenCV library. A computational structure is presented for the developed algorithm. Also, scheduling algorithm was developed to balance a workload among GPU’s threads.

CUDA

Oct, 16

High-Performance Computing Algorithms for Constructing Inverted Files on Emerging Multicore Processors

Current trends in processor architectures increasingly include more cores on a single chip and more complex memory hierarchies, and such a trend is likely to continue in the foreseeable future. These processors offer unprecedented opportunities for speeding up demanding computations if the available resources can be effectively utilized. Simultaneously, parallel programming languages such as OpenMP […]

CUDA

Oct, 16

Particle Filters on Multi-Core Processors

The particle filter is a Bayesian estimation technique based on Monte Carlo simulation. The nonparametric nature of particle filters makes them ideal for non-linear, non-Gaussian dynamic systems. Particle filtering has many applications: in computer vision, robotics, and econometrics to name just a few. Although superior to Kalman filters, particle filters have higher computational requirements, which […]

CUDA

•

OpenCL

Oct, 16

Analysis of Single Phase Fluid Flow and Heat Transfer in Slip Flow Regime by Parallel Implementation of Lattice Boltzmann Method on GPUs

In this thesis work fluid flow and heat transfer in two-dimensional microchannels are studied numerically. A computer code based on Lattice Boltzmann Method (LBM) is developed for this purpose. The code is written using MATLAB and Jacket software and has the important feature of being able to run parallel on Graphics Processing Units (GPUs). The […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Efficient 2D Software Rendering

GPU-based Batched Spatial Query Processing on R-Trees

Platform-independent parallelization of the Lattice Boltzmann method with OpenCL

Design and Performance Evaluation of a Software Framework for Multi-Physics Simulations on Heterogeneous Supercomputers

Enhancing Productivity and Performance Portability of General-Purpose Parallel Programming

Generalized Resource Allocation for the Cloud

Performance Analysis Cluster and GPU Computing Environment on Molecular Dynamic Simulation of BRV-1 and REM2 with GROMACS

Accelerating Fully Homomorphic Encryption on GPUs

Optimal structure of face detection algorithm using GPU architecture

High-Performance Computing Algorithms for Constructing Inverted Files on Emerging Multicore Processors

Particle Filters on Multi-Core Processors

Analysis of Single Phase Fluid Flow and Heat Transfer in Slip Flow Regime by Parallel Implementation of Lattice Boltzmann Method on GPUs

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)