high performance computing on graphics processing units: hgpu.org

Posts

Jan, 25

Performance Evaluation of the Intel Many Integrated Core Architecture for 3D Image Reconstruction in Computed Tomography

The computational effort of 3D image reconstruction in Computed Tomography (CT) has required special purpose hardware for a long time. Systems such as custom-built FPGA-systems and GPUs are still widely-used today, in particular in interventional settings, where radiologists require a hard time constraint for reconstruction. However, recently is has been shown that today even commodity […]

CUDA

Jan, 25

Improvement of the fused CUDA kernels performance prediction

In this thesis a tool for improving the performance prediction of a source-to-source compiler of mapped functions developed on the Faculty of Informatics is presented. This tool integrates the modification of the original compiler and static and dynamic data gathering to provide as much data about the fusions as possible in order to analyze them. […]

CUDA

Jan, 25

Finite differences numerical method for two-dimensional superlattice Boltzmann transport equation and case comparison of CPU(C) and GPGPU(CUDA) implementations

We present finite differences numerical algorithm for solving 2D spatially homogeneous Boltzmann transport equation for semiconductor superlattices (SL) subject to time dependant electric field along SL axis and constant perpendicular magnetic field. Algorithm is implemented in C language targeted to CPU and in CUDA C language targeted to commodity NVidia GPUs. We compare performance and […]

CUDA

Jan, 23

OpenSSL acceleration using Graphics Processing Units

Cryptography: The study of techniques focused on security. Typically, an implementation of cryptography is computationally heavy, leading to performance issues on general purpose systems. Adding the possibility of offloading cryptographic operations to a Graphics Processing Unit (GPU) onto a widespread, open-source cryptographic library such as OpenSSL would be extremely useful in lightening the CPU load […]

CUDA

•

OpenCL

Jan, 23

On the Portability of the OpenCL Dwarfs on Fixed and Reconfigurable Parallel Platforms

The proliferation of heterogeneous computing systems presents the parallel computing community with the challenge of porting legacy and emerging applications to multiple processors with diverse programming abstractions. OpenCL is a vendor-agnostic and industry-supported programming model that offers code portability on heterogeneous platforms, allowing applications to be developed once and deployed "anywhere". In this paper, we […]

OpenCL

Jan, 23

clpeak – peak performance of your opencl device

clpeak is a benchmarking tool intended toward developers to fine-tune opencl kernels for a particular device/class of device. It calculates bandwidth & compute performance for different vector-widths of a datatype, say float, float4. Traditionally it is recommended to use scalar code and we expect opencl compiler to auto-vectorize it. But, most of the times compiler […]

OpenCL

Jan, 23

A comparison between parallelization approaches in molecular dynamics simulations on GPUs

We test the performances of two different approaches to the computation of forces for molecular dynamics simulations on Graphics Processing Units. A "vertex-based" approach, where a computing thread is started per particle, is compared to a newly proposed "edge-based" approach, where a thread is started per each potentially non-zero interaction. We find that the former […]

CUDA

Jan, 23

Multi-GPU parallel memetic algorithm for capacitated vehicle routing problem

The goal of this paper is to propose and test a new memetic algorithm for the capacitated vehicle routing problem in parallel computing environment. In this paper we consider simple variation of vehicle routing problem in which the only parameter is the capacity of the vehicle and each client only needs one package. We present […]

CUDA

Jan, 19

A Lattice Boltzmann Method Simulator for Microfluidics on GPU Cluster

A simulator for microfluidic systems, based on lattice Boltzmann method (LBM) was developed for running on a Graphics Processing Unit (GPU) cluster. It was written on CUDA C language, implementing single component single phase fluids, and includes periodic, velocity, bounce-back and pressure boundary conditions. The program was run on a cluster with four node, where […]

CUDA

Jan, 19

GPU based Implementation of Film Flicker Reduction Algorithms

In this work we propose an algorithm for film restoration aimed at reducing the flicker effect while preserving the original overall illumination of the film. We also present a comparative study of the performance of this algorithm implemented following a sequential approach on a CPU and following a parallel approach on a GPU using OpenCL.

OpenCL

Jan, 19

FlowTour: An Automatic Guide for Exploring Internal Flow Features

We present FlowTour, a novel framework that provides an automatic guide for exploring internal flow features. Our algorithm first identifies critical regions and extracts their skeletons for feature characterization and streamline placement. We then create candidate viewpoints based on the construction of a simplified mesh enclosing each critical region and select best viewpoints based on […]

CUDA

Jan, 19

Finite-difference time-domain solver for room acoustics using graphics processing units

Several acoustic simulation methods have been introduced during the past decades. Wave-based simulation methods have been one of the alternatives, but their applicability for wideband acoustic simulation has been limited by the computing power of available hardware. During recent years, the processing power and programmability of graphics processing units have improved, and therefore several wave-based […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Performance Evaluation of the Intel Many Integrated Core Architecture for 3D Image Reconstruction in Computed Tomography

Improvement of the fused CUDA kernels performance prediction

Finite differences numerical method for two-dimensional superlattice Boltzmann transport equation and case comparison of CPU(C) and GPGPU(CUDA) implementations

OpenSSL acceleration using Graphics Processing Units

On the Portability of the OpenCL Dwarfs on Fixed and Reconfigurable Parallel Platforms

clpeak – peak performance of your opencl device

A comparison between parallelization approaches in molecular dynamics simulations on GPUs

Multi-GPU parallel memetic algorithm for capacitated vehicle routing problem

A Lattice Boltzmann Method Simulator for Microfluidics on GPU Cluster

GPU based Implementation of Film Flicker Reduction Algorithms

FlowTour: An Automatic Guide for Exploring Internal Flow Features

Finite-difference time-domain solver for room acoustics using graphics processing units

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)