high performance computing on graphics processing units: hgpu.org

Posts

Jan, 8

Heat Load Modelling for District Heating Plants Using an OpenCL-based Algorithm

This research paper explores an OpenCL-based algorithm to aid heat load modelling for district heating plants. Previous studies have proven that heat loads mostly depend on the external temperatures (temperature dependency component) and the time of the day (time dependency component). In this research we have used the sum of two truncated exponential functions to […]

OpenCL

Jan, 8

Low cost approach to real-time vehicle to vehicle communication using parallel CPU and GPU processing

This paper proposes a novel Vehicle to Vehicle (V2V) communication system for collision avoidance which merges four different wireless devices (GPS, Wi-Fi, ZigBee and 3G) with a low power embedded Single Board Computer (SBC) in order to increase processing speed while maintaining a low cost. The three major technical challenges with such combinations are the […]

CUDA

Jan, 8

Optimizations in Bioinformatics using GPU Processing on Binary Data

This experiment explores the performance of GPUs in genetic algorithms using binary data. The experiment executes a genetic algorithm which works with binary sequences that are processed on the GPU. The hypothesis is that an optimal number of maximum threads (likely larger than small) is required to have an optiomal runtime. The results show that […]

CUDA

Jan, 8

Portable Mapping of Data Parallel Programs to OpenCL for Heterogeneous Systems

General purpose GPU based systems are highly attractive as they give potentially massive performance at little cost. Realizing such potential is challenging due to the complexity of programming. This paper presents a compiler based approach to automatically generate optimized OpenCL code from data-parallel OpenMP programs for GPUs. Such an approach brings together the benefits of […]

OpenCL

Jan, 8

High Performance Multi-dimensional (2D/3D) FFT-Shift Implementation on Graphics Processing Units (GPUs)

Frequency domain analysis is one of the most common analysis techniques in signal and image processing. Fast Fourier Transform (FFT) is a well know tool used to perform such analysis by obtaining the frequency spectrum for time- or spatial-domain signals and vice versa. FFT-Shift is a subsequent operation used to handle the resulting arrays from […]

CUDA

Jan, 8

Implementation of FDTD-Compatible Green’s Function on Heterogeneous CPU-GPU Parallel Processing System

This paper presents an implementation of the FDTD-compatible Green’s function on a heterogeneous parallel processing system. The developed implementation simultaneously utilizes computational power of the central processing unit (CPU) and the graphics processing unit (GPU) to the computational tasks best suited to each architecture. Recently, closed-form expression for this discrete Green’s function (DGF) was derived, […]

CUDA

Jan, 8

Efficient Weighted Histogramming on GPUs with CUDA

The histogram is a fundamental statistical tool that has been extensively used in various domains. In data mining and machine learning applications, weighted histogram calculation often serves as a key component in the processing of their massive data sets. However, the atomic operation, which is introduced to resolve the collisions in GPU-based parallel histogramming with […]

CUDA

Jan, 8

Distributed Massive Model Rendering

Graphics models are getting increasingly bulkier with detailed geometry, textures, normal maps, etc. There is a lot of interest to model and navigate through detailed models of large monuments. Many monuments of interest have both rich detail and large spatial extent. Rendering them for navigation on a single workstation is practically impossible, even given the […]

CUDA

Jan, 8

GPU-Optimized Coarse-Grained MD Simulations of Protein and RNA Folding and Assembly

Molecular dynamics (MD) simulations provide a molecular-resolution physical description of the folding and assembly processes, but the size and the timescales of simulations are limited because the underlying algorithm is computationally demanding. We recently introduced a parallel neighbor list algorithm that was specifically optimized for MD simulations on GPUs. In our present study, we analyze […]

CUDA

Jan, 7

CUDA based iterative methods for linear systems

Solving large linear systems of equations is a common problem in the fields of science and engineering. Direct methods for computing the solution of such systems can be very expensive due to high memory requirements and computational cost. This is a very good reason to use iterative methods which computes only an approximation of the […]

CUDA

Jan, 7

Performance comparison of gauss-Jordan elimination method using OpenMP and CUDA

It is important to obtain the results of methods that are used in solving scientific and engineering problems rapidly for users and application developers. Parallel programming techniques have been developed alongside serial programming because the importance of performance has been increasing day by day while developing computer applications.Various methods such as Gauss Elimination (GE) Method, […]

CUDA

Jan, 7

Numerical computations in Java with CUDA

Parallel computing can offer an enormous advantage regarding the performance for very large applications in almost any field: scientific computing, computer vision, databases, data mining, and economics. GPUs are high performance many-core processors that can obtain very high FLOP rates. Since the first idea of using GPU for general purpose computing, things have evolved and […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Heat Load Modelling for District Heating Plants Using an OpenCL-based Algorithm

Low cost approach to real-time vehicle to vehicle communication using parallel CPU and GPU processing

Optimizations in Bioinformatics using GPU Processing on Binary Data

Portable Mapping of Data Parallel Programs to OpenCL for Heterogeneous Systems

High Performance Multi-dimensional (2D/3D) FFT-Shift Implementation on Graphics Processing Units (GPUs)

Implementation of FDTD-Compatible Green’s Function on Heterogeneous CPU-GPU Parallel Processing System

Efficient Weighted Histogramming on GPUs with CUDA

Distributed Massive Model Rendering

GPU-Optimized Coarse-Grained MD Simulations of Protein and RNA Folding and Assembly

CUDA based iterative methods for linear systems

Performance comparison of gauss-Jordan elimination method using OpenMP and CUDA

Numerical computations in Java with CUDA

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)