high performance computing on graphics processing units: hgpu.org

Posts

Jul, 7

Comparative study of parallel programming models for multicore computing

Shared memory multi-core processor technology has seen a drastic development with faster and increasing number of processors per chip. This new architecture challenges computer programmers to write code that scales over these many cores to exploit full computational power of these machines. Shared-memory parallel programming paradigms such as OpenMP and Intel Threading Building Blocks (TBB) […]

OpenCL

Jul, 5

Triangular mesh simplification on the GPU

We present a simplification algorithm for triangular meshes, implemented on the GPU. The algorithm performs edge collapses driven by a quadric error metric. It uses data parallelism as provided by OpenCL and has no sequential segments in its main iterative structure in order to fully exploit the processing power of the GPU. Our implementation produces […]

OpenCL

Jul, 2

CFD Simulation of Jet Cooling and Implementation of Flow Solvers in GPU

In rolling of steel into thin sheets the final step is the cooling of the finished product on the Runout Table. In this thesis, the heat transfer into a water jet impinging on a hot flat steel plate was studied as the key cooling process on the runout table. The temperature of the plate was […]

OpenCL

Jul, 1

Towards Performance-Portable, Scalable, and Convenient Linear Algebra

The rise of multi- and many-core architectures also gave birth to a plethora of new parallel programming models. Among these, the open industry standard OpenCL addresses this heterogeneity of programming environments by providing a unified programming framework. The price to pay, however, is that OpenCL requires additional low-level boilerplate code, when compared to vendor-specific solutions, […]

OpenCL

Jun, 29

Adaptation of algorithms for underwater sonar data processing to GPU-based systems

In this master thesis, algorithms for acoustic simulations in underwater environments are ported for GPU processing. The GPU parallel computing platforms used are CUDA, OpenCL and SkePU. The purpose of this master thesis is to adapt and evaluate the ported algorithms’ performance on two modern NVIDIA GPUs, Tesla K20 and Quadro K5000. Several optimizations, described […]

CUDA

•

OpenCL

Jun, 29

Efficient computation of constrained parameterizations on parallel platforms

Constrained isometric planar parameterizations are central to a broad spectrum of applications. In this work, we present a non linear solver developed on OpenCL that is efficiently parallelizable on modern massively parallel architectures. We establish how parameterization relates to mesh smoothing and show how to ciently and robustly solve the planar mesh parameterization problem with […]

OpenCL

Jun, 24

An Energy Efficient GPGPU Memory Hierarchy with Tiny Incoherent Caches

With progressive generations and the ever-increasing promise of computing power, GPGPUs have been quickly growing in size, and at the same time, energy consumption has become a major bottleneck for them. The first level data cache and the scratchpad memory are critical to the performance of a GPGPU, but they are extremely energy inefficient due […]

CUDA

•

OpenCL

Jun, 21

Parallel Language Programming In Different Platforms

The need to speed-up computing has introduced the interest to explore parallelism in algorithms and parallel programming. Technology is evolving fast but computing power in sequential execution is not increasing as much as earlier but CPUs contain more and more parallel computing resources. However, parallel algorithms may not be able to exploit all the parallelism […]

CUDA

•

OpenCL

Jun, 17

GPU Programming in Rust: Implementing High Level Abstractions in a Systems Level Language

Graphics processing units (GPUs) have the potential to greatly accelerate many applications, and yet programming models still remain too low level. Many language-based solutions to date have addressed this problem by creating embedded domain-specific languages that compile to CUDA or OpenCL. These targets are meant for human programmers and thus are less than ideal compilation […]

CUDA

•

OpenCL

Jun, 12

Performance of a GPU-based Direct Summation Algorithm for Computation of Small Angle Scattering Profile

Small Angle Scattering (SAS) of X-rays or neutrons is an experimental technique that provides valuable structural information for biological macromolecules under physiological conditions and with no limitation on the molecular size. In order to refine molecular structure against experimental SAS data, ab initio prediction of the scattering profile must be recomputed hundreds of thousands of […]

OpenCL

Jun, 10

Processing XPath Structural Constraints on GPU

Technologies such as CUDA and OpenCL have popularized the usage of graphics cards (GPUs) for general purpose programming, often with impressive performance gains. However, using such cards for speeding up XML Databases processing is yet to be fully explored. XML databases offer much flexibility for Web-oriented systems. Nonetheless, such flexibility comes at a considerable computational […]

CUDA

•

OpenCL

Jun, 8

Accelerated Dynamic Programming on GPU: A Study of Speed Up and Programming Approach

GPUs (Graphics processing units) can be used for general purpose parallel computation. Developers can develop parallel programs running on GPUs using different computing architectures like CUDA or OpenCL. The Optimal Matrix Chain Multiplication problem is an optimization problem to find the optimal order for multiplying a chain of matrices. The optimal order of multiplication depends […]

CUDA

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Comparative study of parallel programming models for multicore computing

Triangular mesh simplification on the GPU

CFD Simulation of Jet Cooling and Implementation of Flow Solvers in GPU

Towards Performance-Portable, Scalable, and Convenient Linear Algebra

Adaptation of algorithms for underwater sonar data processing to GPU-based systems

Efficient computation of constrained parameterizations on parallel platforms

An Energy Efficient GPGPU Memory Hierarchy with Tiny Incoherent Caches

Parallel Language Programming In Different Platforms

GPU Programming in Rust: Implementing High Level Abstractions in a Systems Level Language

Performance of a GPU-based Direct Summation Algorithm for Computation of Small Angle Scattering Profile

Processing XPath Structural Constraints on GPU

Accelerated Dynamic Programming on GPU: A Study of Speed Up and Programming Approach

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)