high performance computing on graphics processing units: hgpu.org

Posts

Jul, 21

Image reconstruction in digital holographic microscopy on GPU

The aim of the thesis is to implement and optimize chosen image processing algorithms used in digital holographic microscopy on the GPU. The algorithms are 2-D phase unwrapping and polynomial surface fitting. They are described and certain used optimizations are pointed out. The results chapter shows the performance and precision of the GPU implementation compared […]

CUDA

Jul, 20

Bone Structure Analysis with GPGPUs

Osteoporosis is a disease that affects a growing number of people by increasing the fragility of their bones. To improve the understanding of the bone, large scaled computer simulations are applied. A fast, scalable and memory efficient solver for such problems is ParOSol. It uses the preconditioned conjugate gradient algorithm with a multigrid preconditioner. A […]

Jul, 20

Lattice QCD on Intel Xeon Phi

The Intel Xeon Phi architecture from Intel Corporation features parallelism at the level of many x86-based cores, multiple threads per core, and vector processing units. Lattice Quantum Chromodynamics (LQCD) is currently the only known model independent, non perturbative computational method for calculations in theory of the strong interactions, and is of importance in studies of […]

CUDA

Jul, 20

GPU Computing in Economics

This paper discusses issues related to GPU for Economic problems. It highlights new methodologies and resources that are available for solving and estimating economic models and emphasizes situations when they are useful and others where they are impractical. Two examples illustrate the different ways these GPU parallel methods can be employed to speed computation.

CUDA

Jul, 20

Real Time Pixel Art Remasterization on GPUs

Several methods have been proposed to overcome the pixel art scaling problem through the years. In this article we describe a novel approach to be applied through a massively parallel architecture that can address this issue in real time. To achieve this we design a local and context independent algorithm that enables an efficient parallel […]

CUDA

Jul, 20

An Efficient Deterministic Parallel Algorithm for Adaptive Multidimensional Numerical Integration on GPUs

Recent development in Graphics Processing Units (GPUs) has enabled a new possibility for highly efficient parallel computing in science and engineering. Their massively parallel architecture makes GPUs very effective for algorithms where processing of large blocks of data can be executed in parallel. Multidimensional integration has important applications in areas like computational physics, plasma physics, […]

CUDA

Jul, 19

OpenCL API Extensions to achieve Multi-level Parallelism for Efficient Implementation of Strassen’s Matrix Multiplication on GPUs

Strassen’s matrix multiplication algorithm is an efficient and widely used practical algorithm for matrix multiplication. In its basic form, the algorithm is a series of recursive steps to decompose the matrices, multiply intermediate matrices and another set of recursive steps to recompose the product matrix. Implementing the algorithm on a GPU requires it to be […]

OpenCL

Jul, 19

HAccRG: Hardware-Accelerated Data Race Detection in GPUs

Modern Graphics Processing Units (GPUs) are capable of supporting thousands of concurrent threads. However, they provide relatively little guarantee with respect to the coherence and consistency of the memory system. Thus, GPUs are prone to multitude of concurrency bugs related to inconsistent memory states. Many such bugs manifest as some form of data races at […]

CUDA

Jul, 19

Methods for Optimizing OpenCL Applications on Heterogeneous Multicore Architectures

Heterogeneous multicore architectures with CPU and add-on GPUs or streaming processors are now widely used in computer systems. These GPUs provide substantially more computation capability and memory bandwidth compared to traditional multi-cores. Also, because they are highly programmable, they provide the computational performance needed for realistic graphics rendering. Applications with general computations can also be […]

OpenCL

Jul, 19

Parallel Image Segmentation Using Reduction-Sweeps On Multicore Processors and GPUs

In this paper we introduce the Reduction Sweep algorithm, a novel graph-based image segmentation algorithm that is designed for easy parallelization. It is based on a clustering approach focusing on local image characteristics. Each pixel is compared with its neighbors in an implicitly independent manner, and those deemed sufficiently similar according to a color criterion […]

CUDA

Jul, 19

On Benchmarking the Matrix Multiplication Algorithm using OpenMP, MPI and CUDA Programming Languages

Parallel programming languages represent a common theme in the evolution of high performance computing (HPC) systems. There are several parallel programming languages that are directly associated with different HPC systems. In this paper, we compare the performance of three commonly used parallel programming languages, namely: OpenMP, MPI and CUDA. Our performance evaluation of these languages […]

CUDA

Jul, 17

A Software-Based Self Test of CUDA Fermi GPUs

Nowadays, Graphical Processing Units (GPUs) have become increasingly popular due to their high computational power and low prices. This makes them particularly suitable for high-performance computing applications, like data elaboration and financial computation. In these fields, high efficient test methodologies are mandatory. One of the most effective ways to detect and localize hardware faults in […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Image reconstruction in digital holographic microscopy on GPU

Bone Structure Analysis with GPGPUs

Lattice QCD on Intel Xeon Phi

GPU Computing in Economics

Real Time Pixel Art Remasterization on GPUs

An Efficient Deterministic Parallel Algorithm for Adaptive Multidimensional Numerical Integration on GPUs

OpenCL API Extensions to achieve Multi-level Parallelism for Efficient Implementation of Strassen’s Matrix Multiplication on GPUs

HAccRG: Hardware-Accelerated Data Race Detection in GPUs

Methods for Optimizing OpenCL Applications on Heterogeneous Multicore Architectures

Parallel Image Segmentation Using Reduction-Sweeps On Multicore Processors and GPUs

On Benchmarking the Matrix Multiplication Algorithm using OpenMP, MPI and CUDA Programming Languages

A Software-Based Self Test of CUDA Fermi GPUs

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)