high performance computing on graphics processing units: hgpu.org

Posts

Oct, 17

cudaMap: a GPU accelerated program for gene expression connectivity mapping

BACKGROUND: Modern cancer research often involves large datasets and the use of sophisticated statistical techniques. Together these add a heavy computational load to the analysis, which is often coupled with issues surrounding data accessibility. Connectivity mapping is an advanced bioinformatic and computational technique dedicated to therapeutics discovery and drug re-purposing around differential gene expression analysis. […]

CUDA

Oct, 15

Performance Comparison of GPU, DSP and FPGA implementations of image processing and computer vision algorithms in embedded systems

The objective of this thesis is to compare the suitability of FPGAs, GPUs and DSPs for digital image processing applications. Normalized cross-correlation is used as a benchmark, because this algorithm includes convolution, a common operation in image processing and elsewhere. Normalized cross-correlation is a template matching algorithm that is used to locate predefined objects in […]

CUDA

Oct, 15

Scaling Soft Matter Physics to Thousands of GPUs in Parallel

We describe a multi-GPU implementation of the Ludwig application, which specialises in simulating of a variety of complex fluids via lattice Boltzmann fluid dynamics coupled to additional physics describing complex fluid constituents. We describe our methodology in augmenting the original CPU version with GPU functionality in a maintainable fashion. We present several optimisations that maximize […]

CUDA

Oct, 15

Domain-Specific Languages for Heterogeneous Parallel Computing

The heterogeneous parallel computing era has been accompanied by an ever-increasing number of disparate programming models. As a result, improving performance via heterogeneous computing is currently very challenging for application programmers. Domain-specific languages (DSLs) are a potential solution to this problem, as they can provide productivity, performance, and portability within the confines of a specific […]

CUDA

Oct, 15

GPU-acceleration of parallel unconditionally stable group explicit finite difference method

Graphics Processing Units (GPUs) are high performance co-processors originally intended to improve the use and quality of computer graphics applications. Since researchers and practitioners realized the potential of using GPU for general purpose, their application has been extended to other fields out of computer graphics scope. The main objective of this paper is to evaluate […]

CUDA

Oct, 15

GPU-Framework for Teamwork Action Recognition

Real time processing for teamwork action recognition is a challenge, due to complex computational models to achieve high system performance. Hence, this paper proposes a framework based on Graphical Processing Units (GPUs) to achieve a significant speed up in the performance of role based activity recognition of teamwork. The framework can be applied in various […]

CUDA

Oct, 15

An Efficient WSN Simulator for GPU-Based Node Performance

In wireless sensor network, when these sensors are wrongly placed in an observation region, they can quickly run out of batteries or be disconnected. These incidents may result in huge losses in terms of sensing data from numerous sensors and their costs. For this reason, a number of simulators have been developed as tools for […]

CUDA

Oct, 15

Point to Line Mappings and Other Line Parameterizations not only for Hough Transform

This works focuses on the Hough transform (HT). The HT is mostly used for the detection of lines or curves, but was also generalized for detection of arbitrary shapes. The main theme of this work are line parameterizations, especially the Point-to-Line mappings. These parameterizations share the property, that a point in the image maps onto […]

CUDA

•

OpenGL

Oct, 15

Massively Parallel Lossless Compression of Medical Images Using Least-Squares Prediction and Arithmetic Coding

Medical imaging in hospitals requires fast and efficient image compression to support the clinical work flow and to save costs. Leastsquares autoregressive pixel prediction methods combined with arithmetic coding constitutes the state of the art in lossless image compression. However, a high computational complexity of both prevents the application of respective CPU implementations in practice. […]

CUDA

Oct, 15

High-Performance GPGPU Programming with OCaml

We present an OCaml GPGPU library with a DSL embedded into OCaml to express GPGPU kernels. The level of performance achieved is measured through different examples. We also discuss the use of GPGPU programming to increase the performance of multicore-CPUs software, written in OCaml.

CUDA

•

OpenCL

Oct, 15

Uses of GPU Powered Interval Optimization for Parameter Identification in the Context of SO Fuel Cells

In this paper, we discuss parameter identification for models based on ordinary differential equations in the context of solid oxide fuel cells. In this case, verified methods (e.g. interval analysis), which provide a guarantee of correctness for the computed result, can be of great help for dealing with the appearing uncertainty and for devising accurate […]

CUDA

Oct, 13

GPU-Specfic Kalman Filtering and Retrodiction for Large-Scale Target Tracking

In the field of Tracking and Data Fusion most, if not all, computations executed by a computer are carried out serially. The sole part of the process that is not entirely serial is the collection of data from multiple sensors, which can be executed in parallel. However, once the data is to be filtered the […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

cudaMap: a GPU accelerated program for gene expression connectivity mapping

Performance Comparison of GPU, DSP and FPGA implementations of image processing and computer vision algorithms in embedded systems

Scaling Soft Matter Physics to Thousands of GPUs in Parallel

Domain-Specific Languages for Heterogeneous Parallel Computing

GPU-acceleration of parallel unconditionally stable group explicit finite difference method

GPU-Framework for Teamwork Action Recognition

An Efficient WSN Simulator for GPU-Based Node Performance

Point to Line Mappings and Other Line Parameterizations not only for Hough Transform

Massively Parallel Lossless Compression of Medical Images Using Least-Squares Prediction and Arithmetic Coding

High-Performance GPGPU Programming with OCaml

Uses of GPU Powered Interval Optimization for Parameter Identification in the Context of SO Fuel Cells

GPU-Specfic Kalman Filtering and Retrodiction for Large-Scale Target Tracking

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)