high performance computing on graphics processing units: hgpu.org

Posts

Dec, 8

OpenMP Programming on Intel R Xeon Phi TM Coprocessors: An Early Performance Comparison

The demand for more and more compute power is growing rapidly in many fields of research. Accelerators, like GPUs, are one way to fulfill these requirements, but they often require a laborious rewrite of the application using special programming paradigms like CUDA or OpenCL. The Intel(R) Xeon Phi(TM) coprocessor is based on the Intel(R) Many […]

Dec, 8

Towards Building Error Resilient GPGPU Applications

GPUs (Graphics Processing Units) have gained wide adoption as accelerators for general purpose computing. They are widely used in error-sensitive applications, i.e. General Purpose GPU (GPGPU) applications However, the reliability implications of using GPUs are unclear. This paper presents a fault injection study to investigate the end-to-end reliability characteristics of GPGPU applications. The investigation showed […]

CUDA

Dec, 8

Data Compression using CUDA programming in GPU

The aim of this project is to explore and test the new potential performance improvement that can be achieved by the use of GPU processing architecture for several different types of compression algorithms. The compression algorithms are the choice of focus on the data parallelism on the GPU device. The specific algorithms ported to the […]

CUDA

Dec, 8

Smoothed Particle Hydrodynamics Simulation for Continuous Casting

This thesis proposes a way of simulating the continuous casting process of steel using Smoothed Particle Hydrodynamics (SPH). It deals with the SPH modeling of mass, momentum and the energy equations. The interpolation kernel functions required for the SPH modeling of these equations are calculated. Solidification is modeled by some particles are used to represent […]

OpenCL

Dec, 8

A Computationally Efficient Parallel Kernel Regression for Image Reconstruction

Image reconstruction is a method by which the underlying images, hidden in blurry and noisy data, can be retrieved. This is used in applications such as computer tomography (CT), magnetic resonance and radio astronomy. In recent times, a non-parametric adaptive regression method called steering kernel regression was proposed and proved to be effective. This method […]

CUDA

Dec, 8

A computationally efficient and scalable approach for privacy preserving kNN classification

In the modern age, there is a great desire to mine users’ personal data from varied sources, to discover their behaviours. However, due to the growing awareness among the organizations regarding the privacy of user data and the strict privacy regulations of government, there is a growing resistance to share data directly with others. Encryption […]

CUDA

Dec, 8

GPGPU-Aided 3D Staggered-grid Finite-difference Seismic Wave Modeling

Finite difference is a simple, fast and effective numerical method for seismic wave modeling, and has been widely used in forward waveform inversion and reverse time migration. However, intensive calculation of three-dimensional seismic forward modeling has been restricting the industrial application of 3D pre-stack reverse time migration and inversion. Aiming at this problem, in this […]

CUDA

Dec, 8

GPU accelerating the FEniCS Project

In the recent years, the graphics processing unit (GPU) has emerged as a popular platform for performing general purpose calculations. The high computational power of the GPU has made it an attractive accelerator for achieving increased performance of computer programs. Although GPU programming has become more tangible over the years, it is still challenging to […]

CUDA

Dec, 8

Lightweight Modular Staging and Embedded Compilers: Abstraction Without Regret for High-Level High-Performance Programming

Programs expressed in a high-level programming language need to be translated to a low-level machine dialect for execution. This translation is usually accomplished by a compiler, which is able to translate any legal program to equivalent low-level code. But for individual source programs, automatic translation does not always deliver good results: Software engineering practice demands […]

CUDA

Dec, 5

A Data-Parallel Graphics Pipeline Implemented in OpenCL

This report documents implementation details, results, benchmarks and technical discussions for the work carried out within a master’s thesis at Linkoping University. Within the master’s thesis, the field of software rendering is explored in the age of parallel computing. Using the Open Computing Language, a complete graphics pipeline was implemented for use on general processing […]

OpenCL

Dec, 5

Mapping Streaming Applications to OpenCL

Graphic processing units (GPUs) have been gaining popularity in general purpose and high performance computing. A GPU is made up of a number of streaming multiprocessors (SM), each of which consists of many processing cores. A large number of general-purpose applications have been mapped onto GPUs efficiently. Stream processing applications, however, exhibit properties such as […]

OpenCL

Dec, 5

Parallel Cosegmentation via Submodular Optimization on Anisotropic Diffusion

With large number of related images being used for applications such as MR spectroscopy imaging, Object of interest 3D modelling and photo collages, the need of the hour is to accelerate image cosegmentation algorithms. Cosegmentation refers to the process of segmenting common regions from multiple related images. A novel distributed algorithm, CoSand [1], for cosegmentation […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

OpenMP Programming on Intel R Xeon Phi TM Coprocessors: An Early Performance Comparison

Towards Building Error Resilient GPGPU Applications

Data Compression using CUDA programming in GPU

Smoothed Particle Hydrodynamics Simulation for Continuous Casting

A Computationally Efficient Parallel Kernel Regression for Image Reconstruction

A computationally efficient and scalable approach for privacy preserving kNN classification

GPGPU-Aided 3D Staggered-grid Finite-difference Seismic Wave Modeling

GPU accelerating the FEniCS Project

Lightweight Modular Staging and Embedded Compilers: Abstraction Without Regret for High-Level High-Performance Programming

A Data-Parallel Graphics Pipeline Implemented in OpenCL

Mapping Streaming Applications to OpenCL

Parallel Cosegmentation via Submodular Optimization on Anisotropic Diffusion

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)