high performance computing on graphics processing units: hgpu.org

Posts

Nov, 16

Object Space Based Collision Detection for Cloth Simulation on the GPU

This paper presents an approach for cloth-body collision detection in computer graphics simulations of clothing. It is an object-space based algorithm implemented in OpenCL on the GPU. The underlying idea behind this work is to speed up the solution of the collision detection problem by utilizing the excessive computational capacity of contemporary GPUs. Results of […]

OpenCL

•

OpenGL

Nov, 16

Parallel Approach for Longest Common Subsequence problem on GPU

Recent developments in genomic and molecular technologies produced a tremendous amount of information related to molecular biology. The management and analysis of these biological data require intensive computing power. Sequence aligning is one of the algorithmic tools in bioinformatics to look for resemblance among sequences of amino acids. The longest common subsequence (LCS) of biological […]

CUDA

•

OpenCL

Nov, 12

Creating HW/SW co-designed MPSoPC’s from high level programming models

FPGA densities have continued to follow Moore’s law and can now support a complete multiprocessor system on programmable chip. The benefits of the FPGA include the ability to build a customized MPSoC system consisting of heterogeneous processing resources, interconnects and memory hierarchies that best match the requirements of each application. In this paper we outline […]

OpenCL

Nov, 12

Safe Asynchronous Multicore Memory Operations

Asynchronous memory operations provide a means for coping with the memory wall problem in multicore processors, and are available in many platforms and languages, e.g., the Cell Broadband Engine, CUDA and OpenCL. Reasoning about the correct usage of such operations involves complex analysis of memory accesses to check for races. We present a method and […]

Nov, 11

Synthetic Aperture Beamformation using the GPU

A synthetic aperture ultrasound beamformer is implemented for a GPU using the OpenCL framework. The implementation supports beamformation of either RF signals or complex baseband signals. Transmit and receive apodization can be either parametric or dynamic using a fixed F-number, a reference, and a direction. Images can be formed using an arbitrary number of emissions […]

OpenCL

Nov, 10

GPU Acceleration of Matrix-based Methods in Computational Electromagnetics

This work considers the acceleration of matrix-based computational electromagnetic (CEM) techniques using graphics processing units (GPUs). These massively parallel processors have gained much support since late 2006, with software tools such as CUDA and OpenCL greatly simplifying the process of harnessing the computational power of these devices. As with any advances in computation, the use […]

CUDA

•

OpenCL

Nov, 10

A CPU-GPU Hybrid Runtime for the Aeminium Language

Given that CPU clock speeds are stagnating, programmers are resorting to parallelism to improve the performance of their applications. Although such parallelism has usually been attained using either multicore architectures, multiple CPUs and/or clusters of machines, the GPU has since been used as an alternative. GPUs are an interesting resource because they can provide much […]

OpenCL

Nov, 10

Bit-Parallel Multiple Pattern Matching

Text matching with errors is a regular task in computational biology. We present an extension of the bit-parallel Wu-Manber algorithm to combine several searches for a pattern into a collection of fixed-length words. We further present an OpenCL parallelization of a redundant index on massively parallel multicore processors, within a framework of searching for similarities […]

OpenCL

Nov, 8

20th Euromicro International Conference on Parallel, Distributed and Network-Based Computing, PDP 2012

The Special Session on GPU Computing and Hybrid Computing aims at providing a forum for scientific researchers and engineers on hot topics related to GPU computing and hybrid computing with special emphasis on applications, performance analysis, programming models and mechanisms for mapping codes. Topics: GPU computing, multi GPU processing, hybrid computing; Programming models, programming frameworks, […]

Nov, 7

Flocking Implementation for the Blender Game Engine

In this thesis, we discuss the development of a new Boids system that simulates flocking behavior inside the Blender Game Engine and within the framework of the Real-Time Particles System (RTPS) library developed by Ian Johnson. The collective behavior of Boids is characterized as an emergent behavior caused by following three steering behaviors: separation, alignment, […]

OpenCL

Nov, 7

High-Level Design for FPGA-based Multiprocessor Accelerators

Field programmable gate arrays (FPGAs) have the potential to accelerate scientific computing applications due to their highly parallel architecture. However, for programming these architectures efficiently, hardware description languages (HDL), such as Verilog or VHDL, have to be used. Many application developers are not familiar with these HDL languages, because they traditionally develop their applications using […]

Nov, 7

Functional Programming for High-Performance Computing on Heterogeneous Architectures

Heterogeneous architectures become dominant in high-performance computing platforms but programming them remains really hard, especially because high-performance programs are usually written using low-level languages (C, Fortran, OpenMP…) and frameworks (CUDA, OpenCL…). Mid-level frameworks have been introduced to automatically perform management of distributed memory and scheduling on different devices, allowing applications to only submit tasks and […]

CUDA

•

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Posts

Object Space Based Collision Detection for Cloth Simulation on the GPU

Parallel Approach for Longest Common Subsequence problem on GPU

Creating HW/SW co-designed MPSoPC’s from high level programming models

Safe Asynchronous Multicore Memory Operations

Synthetic Aperture Beamformation using the GPU

GPU Acceleration of Matrix-based Methods in Computational Electromagnetics

A CPU-GPU Hybrid Runtime for the Aeminium Language

Bit-Parallel Multiple Pattern Matching

20th Euromicro International Conference on Parallel, Distributed and Network-Based Computing, PDP 2012

Flocking Implementation for the Blender Game Engine

High-Level Design for FPGA-based Multiprocessor Accelerators

Functional Programming for High-Performance Computing on Heterogeneous Architectures

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)