high performance computing on graphics processing units: hgpu.org

Posts

Mar, 8

Generating Performance Portable Code using Rewrite Rules: From High-level Functional Expressions to High-Performance OpenCL Code

Computing systems have become increasingly complex with the emergence of heterogeneous hardware combining multicore CPUs and GPUs. These parallel systems exhibit tremendous computational power at the cost of increased programming effort. This results in a tension between performance and code portability. Typically, code is either tuned in an low-level imperative language using hardware-specific optimizations to […]

OpenCL

Mar, 6

Lyra2: Password Hashing Scheme with improved security against time-memory trade-offs

We present Lyra2, a password hashing scheme (PHS) based on cryptographic sponges. Lyra2 was designed to be strictly sequential (i.e., not easily parallelizable), providing strong security even against attackers that uses multiple processing cores (e.g., custom hardware or a powerful GPU). At the same time, it is very simple to implement in software and allows […]

CUDA

Mar, 6

High-Performance Computation of a Jet in Cross Flow by Lattice Boltzmann Based Parallel Direct Numerical Simulation

Direct numerical simulation (DNS) of a round jet in crossflow based on lattice-Boltzmann method (LBM) is carried out on multi-GPU cluster. Data-parallel SIMT (Single- Instruction Multiple-Thread) characteristic of GPU matches the parallelism of LBM well, which leads to the high efficiency of GPU on the LBM solver. With present GPU settings (6 Nvidia Telsa K20M), […]

CUDA

Mar, 6

PONDER – A Real time software backend for pulsar and IPS observations at the Ooty Radio Telescope

This paper describes a new real-time versatile backend, the Pulsar Ooty Radio Telescope New Digital Efficient Receiver (PONDER), which has been designed to operate along with the legacy analog system of the Ooty Radio Telescope (ORT). PONDER makes use of the current state of the art computing hardware, a Graphical Processing Unit (GPU) and sufficiently […]

CUDA

Mar, 6

Multi-GPU implementation of a VMAT treatment plan optimization algorithm

VMAT optimization is a computationally challenging problem due to its large data size, high degrees of freedom, and many hardware constraints. High-performance graphics processing units have been used to speed up the computations. However, its small memory size cannot handle cases with a large dose-deposition coefficient (DDC) matrix. This paper is to report an implementation […]

CUDA

Mar, 6

An OpenCL-based Monte Carlo dose calculation engine (oclMC) for coupled photon-electron transport

Monte Carlo (MC) method has been recognized the most accurate dose calculation method for radiotherapy. However, its extremely long computation time impedes clinical applications. Recently, a lot of efforts have been made to realize fast MC dose calculation on GPUs. Nonetheless, most of the GPU-based MC dose engines were developed in NVidia CUDA environment. This […]

OpenCL

Mar, 3

An efficient solution for hazardous geophysical flows simulation using GPUs

The movement of poorly sorted material over steep areas constitutes a hazardous environmental problem. Computational tools help in the understanding and predictions of such landslides. The main drawback is the high computational effort required for obtaining accurate numerical solutions due to the high number of cells involved in the calculus. In order to overcome this […]

CUDA

Mar, 3

Energy-and cost-efficient Lattice-QCD computations using graphics processing units

Quarks and gluons are the building blocks of all hadronic matter, like protons and neutrons. Their interaction is described by Quantum Chromodynamics (QCD), a theory under test by large scale experiments like the Large Hadron Collider (LHC) at CERN and in the future at the Facility for Antiproton and Ion Research (FAIR) at GSI. However, […]

OpenCL

Mar, 3

Adaptive Video Encoding Based on OpenCL Face Recognition

Video chatting is now a popular way of communication. However, poor network ruins the experience as the faces are blurred. To solve this problem, the team offers a solution to preserve the clarity of faces under limited transmission rate. In this project, the primary goal is to design a video encoder that reduces the size […]

OpenCL

Mar, 3

Adaptive Kinetic-Fluid Solvers for Heterogeneous Computing Architectures

This paper describes recent progress towards porting a Unified Flow Solver (UFS) to heterogeneous parallel computing. UFS is an adaptive kinetic-fluid simulation tool, which combines Adaptive Mesh Refinement (AMR) with automatic cell-by-cell selection of kinetic or fluid solvers based on continuum breakdown criteria. The main challenge of porting UFS to graphics processing units (GPUs) comes […]

CUDA

Mar, 3

Counting Triangles in Large Graphs on GPU

The clustering coefficient and the transitivity ratio are concepts often used in network analysis, which creates a need for fast practical algorithms for counting triangles in large graphs. Previous research in this area focused on sequential algorithms, MapReduce parallelization, and fast approximations. In this paper we propose a parallel triangle counting algorithm for CUDA GPU. […]

CUDA

Mar, 3

GPU Based Path Integral Control with Learned Dynamics

We present an algorithm which combines recent advances in model based path integral control with machine learning approaches to learning forward dynamics models. We take advantage of the parallel computing power of a GPU to quickly take a massive number of samples from a learned probabilistic dynamics model, which we use to approximate the path […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Generating Performance Portable Code using Rewrite Rules: From High-level Functional Expressions to High-Performance OpenCL Code

Lyra2: Password Hashing Scheme with improved security against time-memory trade-offs

High-Performance Computation of a Jet in Cross Flow by Lattice Boltzmann Based Parallel Direct Numerical Simulation

PONDER – A Real time software backend for pulsar and IPS observations at the Ooty Radio Telescope

Multi-GPU implementation of a VMAT treatment plan optimization algorithm

An OpenCL-based Monte Carlo dose calculation engine (oclMC) for coupled photon-electron transport

An efficient solution for hazardous geophysical flows simulation using GPUs

Energy-and cost-efficient Lattice-QCD computations using graphics processing units

Adaptive Video Encoding Based on OpenCL Face Recognition

Adaptive Kinetic-Fluid Solvers for Heterogeneous Computing Architectures

Counting Triangles in Large Graphs on GPU

GPU Based Path Integral Control with Learned Dynamics

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)