9743

Posts

Jun, 18

Sorting On A Graphics Processing Unit (GPU)

One of the very first GPU sorting algorithms, an adaptation of bitonic sort, was developed by Govindraju et al. [12]. Since this algorithm was developed before the advent of CUDA, the algorithm was implemented using GPU pixel shaders. Zachmann et al. [13] improved on this sort algorithm by using BitonicT rees to reduce the number […]
Jun, 18

Delaunay Triangulation in R3 on the GPU

The Delaunay triangulation of points in R3 is a fundamental computational geometry structure that is useful for representing and studying objects from the physical world. The 3D Delaunay triangulation has desirable qualities that make it useful in many applications like FEM, surface reconstruction and tessellating solids. Algorithms for 3D Delaunay have been devised that utilize […]
Jun, 18

A GPU Parallelized Spectral Method for Elliptic Equations

We design and implement the first polynomial-based spectral method on graphic processing units (GPUs). The key to success lies in the seamless integration of the matrix diagonalization technique and new generation CUDA tools. The method is applicable to elliptic equations with general boundary conditions in both 2-D and 3-D cases. We show remarkable speedups of […]
Jun, 18

Accelerating GPU Programs by Reducing Irregular Control Flow and Memory Access

The graphics processing unit (GPU) is recently used as a massively parallel processor to speed up general computation. However, the GPU can decrease the performance of irregular computation, because the GPU is based on the single instruction, multiple data (SIMD) architecture. The irregular computations here are conditional branches and memory accesses, which vary the behavior […]
Jun, 17

Auto-Tunning of Data Communication on Heterogeneous Systems

Heterogeneous systems formed by trandional CPUs and compute accelerators, such as GPUs, are becoming widely used to build modern supercomputers. However, many different system topologies, i.e., how CPUs, accelerators, and I/O devices are interconnected, are being deployed. Each system organization presents different trade-offs when transferring data between CPUs, accelerators, and nodes within a cluster, requiring […]
Jun, 17

Parallelizing General Histogram Application for CUDA Architectures

Histogramming is a tool commonly used in data analysis. Although its serial version is simple to implement, providing an efficient and scalable way to parallelize it can be challenging. This especially holds in case of platforms that contain one or several massively parallel devices like CUDAcapable GPUs due to issues with domain decomposition, use of […]
Jun, 17

Space Charge Dominated Envelope Dynamics Using GPUs

High power accelerator facilities lead to necessity to consider space charge forces. It is therefore important to study the space charge dynamics in the corresponding channels. To represent the space charge forces of the beam we have developed special software based on some analytical models for space charge distributions. Because calculations for space charge dynamics […]
Jun, 17

Bayesian State-Space Modelling on High-Performance Hardware Using LibBi

LibBi is a software package for state-space modelling and Bayesian inference on modern computer hardware, including multi-core central processing units (CPUs), many-core graphics processing units (GPUs) and distributed-memory clusters of such devices. The software parses a domain-specific language for model specification, then optimises, generates, compiles and runs code for the given model, inference method and […]
Jun, 17

Investigation of GPU-based Pattern Matching

Graphics Processing Units (GPUs) have become the focus of much interest with the scientific community lately due to their highly parallel computing capabilities, and cost effectiveness. They have evolved from simple graphic rendering devices to extremely complex parallel processors, used in a plethora of scientific areas. This paper outlines experimental results of a comparison between […]
Jun, 17

GPU Programming in Rust: Implementing High Level Abstractions in a Systems Level Language

Graphics processing units (GPUs) have the potential to greatly accelerate many applications, and yet programming models still remain too low level. Many language-based solutions to date have addressed this problem by creating embedded domain-specific languages that compile to CUDA or OpenCL. These targets are meant for human programmers and thus are less than ideal compilation […]
Jun, 16

Performance Analysis on Energy Efficient High-Performance Architectures

With the shift in high-performance computing (HPC) towards energy efficient hardware architectures such as accelerators (NVIDIA GPUs) and embedded systems (ARM processors), arose the need to adapt existing performance analysis tools to these new systems. We present EZTrace – a performance analysis framework for parallel applications. EZTrace relies on several core components, in particular on […]
Jun, 16

GPU-Optimized Hybrid Neighbor/Cell List Algorithm for Coarse-Grained Molecular Dynamics Simulations

Molecular Dynamics (MD) simulations provide a molecular-resolution picture of the folding and assembly processes of biomolecules, however, the size and timescales of MD simulations are limited by the computational demands of the underlying numerical algorithms. Recently, Graphics Processing Units(GPUs), specialized devices that were originally designed for rendering images, have been repurposed for high performance computing […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: