high performance computing on graphics processing units: hgpu.org

Posts

Nov, 17

Characterization and Transformation of Unstructured Control Flow in GPU Applications

Hardware and compiler techniques for mapping data-parallel programs with divergent control flow to SIMD architectures have recently enabled the emergence of new GPGPU programming models such as CUDA and OpenCL. Although this technology is widely used, commodity GPUs use different schemes to implement it, and the performance limitations of these different schemes under real workloads […]

CUDA

•

OpenCL

Nov, 17

Massive Image Editing on the Cloud

Processing massive imagery in a distributed environment currently requires the effort of a skilled team to efficiently handle communication, synchronization, faults, and data/process distribution. Moreover, these implementations are highly optimized for a specific system or cluster, therefore portability or improved performance due to system improvements is rarely considered. Much like early GPU computing, cluster computing […]

Nov, 17

Adaboost GPU-based Classifier for Direct Volume Rendering

In volume visualization, the voxel visibitity and materials are carried out through an interactive editing of Transfer Function. In this paper, we present a two-level GPU-based labeling method that computes in times of rendering a set of labeled structures using the Adaboost machine learning classifier. In a pre-processing step, Adaboost trains a binary classifier from […]

CUDA

•

OpenCL

•

OpenGL

Nov, 16

Simulations of Large Particle Systems in Real Time

Simulation of interacting particle systems has been a well established method for many years now. Such systems can span different scales, including microscopic (where particles represent atoms, as in Molecular Dynamics simulations) as well as macroscopic. In the latter case, growing interest is put into Smoothed Particle Hydrodynamics approach. Traditionally, over many years, simulation of […]

CUDA

•

OpenCL

Nov, 16

Object Space Based Collision Detection for Cloth Simulation on the GPU

This paper presents an approach for cloth-body collision detection in computer graphics simulations of clothing. It is an object-space based algorithm implemented in OpenCL on the GPU. The underlying idea behind this work is to speed up the solution of the collision detection problem by utilizing the excessive computational capacity of contemporary GPUs. Results of […]

OpenCL

•

OpenGL

Nov, 16

Parallel Approach for Longest Common Subsequence problem on GPU

Recent developments in genomic and molecular technologies produced a tremendous amount of information related to molecular biology. The management and analysis of these biological data require intensive computing power. Sequence aligning is one of the algorithmic tools in bioinformatics to look for resemblance among sequences of amino acids. The longest common subsequence (LCS) of biological […]

CUDA

•

OpenCL

Nov, 12

Creating HW/SW co-designed MPSoPC’s from high level programming models

FPGA densities have continued to follow Moore’s law and can now support a complete multiprocessor system on programmable chip. The benefits of the FPGA include the ability to build a customized MPSoC system consisting of heterogeneous processing resources, interconnects and memory hierarchies that best match the requirements of each application. In this paper we outline […]

OpenCL

Nov, 12

Safe Asynchronous Multicore Memory Operations

Asynchronous memory operations provide a means for coping with the memory wall problem in multicore processors, and are available in many platforms and languages, e.g., the Cell Broadband Engine, CUDA and OpenCL. Reasoning about the correct usage of such operations involves complex analysis of memory accesses to check for races. We present a method and […]

Nov, 11

Synthetic Aperture Beamformation using the GPU

A synthetic aperture ultrasound beamformer is implemented for a GPU using the OpenCL framework. The implementation supports beamformation of either RF signals or complex baseband signals. Transmit and receive apodization can be either parametric or dynamic using a fixed F-number, a reference, and a direction. Images can be formed using an arbitrary number of emissions […]

OpenCL

Nov, 10

GPU Acceleration of Matrix-based Methods in Computational Electromagnetics

This work considers the acceleration of matrix-based computational electromagnetic (CEM) techniques using graphics processing units (GPUs). These massively parallel processors have gained much support since late 2006, with software tools such as CUDA and OpenCL greatly simplifying the process of harnessing the computational power of these devices. As with any advances in computation, the use […]

CUDA

•

OpenCL

Nov, 10

A CPU-GPU Hybrid Runtime for the Aeminium Language

Given that CPU clock speeds are stagnating, programmers are resorting to parallelism to improve the performance of their applications. Although such parallelism has usually been attained using either multicore architectures, multiple CPUs and/or clusters of machines, the GPU has since been used as an alternative. GPUs are an interesting resource because they can provide much […]

OpenCL

Nov, 10

Bit-Parallel Multiple Pattern Matching

Text matching with errors is a regular task in computational biology. We present an extension of the bit-parallel Wu-Manber algorithm to combine several searches for a pattern into a collection of fixed-length words. We further present an OpenCL parallelization of a redundant index on massively parallel multicore processors, within a framework of searching for similarities […]

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Characterization and Transformation of Unstructured Control Flow in GPU Applications

Massive Image Editing on the Cloud

Adaboost GPU-based Classifier for Direct Volume Rendering

Simulations of Large Particle Systems in Real Time

Object Space Based Collision Detection for Cloth Simulation on the GPU

Parallel Approach for Longest Common Subsequence problem on GPU

Creating HW/SW co-designed MPSoPC’s from high level programming models

Safe Asynchronous Multicore Memory Operations

Synthetic Aperture Beamformation using the GPU

GPU Acceleration of Matrix-based Methods in Computational Electromagnetics

A CPU-GPU Hybrid Runtime for the Aeminium Language

Bit-Parallel Multiple Pattern Matching

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)