high performance computing on graphics processing units: hgpu.org

Posts

Feb, 14

Developing Performance-Portable Molecular Dynamics Kernels in OpenCL

This paper investigates the development of a molecular dynamics code that is highly portable between architectures. Using OpenCL, we develop an implementation of Sandia’s miniMD benchmark that achieves good levels of performance across a wide range of hardware: CPUs, discrete GPUs and integrated GPUs. We demonstrate that the performance bottlenecks of miniMD’s short-range force calculation […]

OpenCL

Feb, 14

Exploring SIMD for Molecular Dynamics, Using Intel Xeon Processors and Intel Xeon Phi Coprocessors

We analyse gather-scatter performance bottlenecks in molecular dynamics codes and the challenges that they pose for obtaining benefits from SIMD execution. This analysis informs a number of novel code-level and algorithmic improvements to Sandia’s miniMD benchmark, which we demonstrate using three SIMD widths (128-, 256- and 512-bit). The applicability of these optimisations to wider SIMD […]

Feb, 14

Enhancing Performance of Meshfree Methods by Hybrid Computing

Hybrid computing technique is used in this study to significantly enhance the performance of meshfree methods. These methods are typically slower than finite element methods (FEM) mostly because their stiffness matrices are much denser ones formed by FEM. As a result, both forming stiffness matrices and solving equations are much slower. In this paper, we […]

CUDA

Feb, 14

The Dual-Path Execution Model for Efficient GPU Control Flow

Current graphics processing units (GPUs) utilize the single instruction multiple thread (SIMT) execution model. With SIMT, a group of logical threads executes such that all threads in the group execute a single common instruction on a particular cycle. To enable control flow to diverge within the group of threads, GPUs partially serialize execution and follow […]

CUDA

Feb, 14

Nonlinear dynamic finite element analysis with GPU

Newmark family of algorithms have been utilized by many engineering applications for the solution of nonlinear dynamic analysis of various structural models. Dynamic and nonlinear nature of such problems and numerical stability requirements of the algorithms increase the need for computation power in order to achieve practical solution times. Thus, this study intends to decrease […]

CUDA

Feb, 14

Efficient Exploitation of Heterogeneous Platforms for Images Features Extraction

Image processing algorithms present a necessary tool for various domains related to computer vision, such as video surveillance, medical imaging, pattern recognition, etc. However, these algorithms are hampered by their high consumption of both computing power and memory, which increase significantly when processing large sets of high resolution images. In this work, we propose a […]

CUDA

Feb, 14

Partial Volume Effect Correction using Anisotropic Backward Diffusion

This paper proposes an algorithm for correcting Partial Volume Effect in Positron Emission Tomography (PET) images, using registered Computed Tomography (CT) data to enhance the blurred PET image. The algorithm is based on a forward-and-backward anisotropic heat equation solver, deblurring the PET image along CT gradients. A forward diffusion force is also utilized to stabilize […]

OpenCL

Feb, 14

Finite-size scaling method for the Berezinskii-Kosterlitz-Thouless transition

We present an improved finite-size scaling method for reliably extracting the critical temperature T_BKT of a Berezinskii-Kosterlitz-Thouless (BKT) transition. Using the known Weber-Minhagen multiplicative logarithmic correction to the spin stiffness rho_s at T_BKT and the Kosterlitz-Nelson relation between the transition temperature and the stiffness, rho_s(T_BKT)=2T_BKT/pi, we define a size dependent transition temperature T_ BKT(L_1,L_2) based […]

CUDA

Feb, 14

Evaluation of Standardized Password-based Key Derivation against Parallel Processing Platforms

Passwords are still the preferred method of user authentication for a large number of applications. In order to derive cryptographic keys from (human-entered) passwords, key-derivation functions are used. One of the most well-known key-derivation functions is the standardized PBKDF2 (RFC2898), which is used in TrueCrypt, CCMP of WPA2, and many more. In this work, we […]

CUDA

Feb, 14

Real-Time 3D Face Identification from a Depth Camera

We present a real-time 3D face identification system using a consumer level depth camera (PrimeSensor). Our system takes a noisy sequence as input and produces reliable identification. Instead of registering a probe to all instances in the database, we propose to only register it with several intermediate references, which considerably reduces processing, while preserving the […]

CUDA

Feb, 13

ADBIS workshop on GPUs In Databases, GID 2013

High performance of modern Graphics Processing Units may be utilized not only for graphics related application but also for general computing. This computing power has been utilized in new variants of many algorithms from almost every computer science domain. Unfortunately, while other application domains strongly benefit from utilizing the GPUs, databases related applications seem not […]

Feb, 12

Exploiting Multiple Levels of Parallelism and Online Refinement of Unstructured Meshes in Atmospheric Model Application

Weather forecasts for long periods of time has emerged as increasingly important. The global concern with the consequences of climate changes has stimulated researches to determine the climate in coming decades. At the same time the steps needed to better defining the modeling and the simulation of climate/weather is far of the desired accuracy. Upscaling […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Developing Performance-Portable Molecular Dynamics Kernels in OpenCL

Exploring SIMD for Molecular Dynamics, Using Intel Xeon Processors and Intel Xeon Phi Coprocessors

Enhancing Performance of Meshfree Methods by Hybrid Computing

The Dual-Path Execution Model for Efficient GPU Control Flow

Nonlinear dynamic finite element analysis with GPU

Efficient Exploitation of Heterogeneous Platforms for Images Features Extraction

Partial Volume Effect Correction using Anisotropic Backward Diffusion

Finite-size scaling method for the Berezinskii-Kosterlitz-Thouless transition

Evaluation of Standardized Password-based Key Derivation against Parallel Processing Platforms

Real-Time 3D Face Identification from a Depth Camera

ADBIS workshop on GPUs In Databases, GID 2013

Exploiting Multiple Levels of Parallelism and Online Refinement of Unstructured Meshes in Atmospheric Model Application

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)