high performance computing on graphics processing units: hgpu.org

Posts

Dec, 12

GPU Based Dose Calculation

The goal of this dissertation was to parallelize a dose calculation code for radiotherapy cancer treatment and explore the suitability of the new Intel Xeon Phi technology for such task. The source code proved to have many bugs and as such it took a long time to be able to produce consistent results. Thus, the […]

CUDA

Dec, 12

Development of Bayesian analysis program for extraction of polarisation observables at CLAS

At the mass of a proton, the strong force is not well understood. Various quark models exist, but it is important to determine which quark model(s) are most accurate. Experimentally, finding resonances predicted by some models and not others would give valuable insight into this fundamental interaction. Several labs around the world use photoproduction experiments […]

OpenCL

Dec, 12

Inter-block synchronization on a GPGPU

With the invention of multi-core processing unit technology, the graphics processing unit has evolved from single core graphic processing unit to multi-core programmable graphics processing units. Because of the GPUs’ architecture, people found that it is not only good at processing graphics related data, but also suitable for performing general purpose parallel computations. However, since […]

OpenCL

Dec, 12

Lessons learned from contrasting a BLAS kernel implementations

This work reviews the experience of implementing different versions of the SSPR rank-one update operation of the BLAS library. The main objective was to contrast CPU versus GPU implementation effort and complexity of an optimized BLAS routine, not considering performance. This work contributes with a sample procedure to compare BLAS kernel implementations, how to start […]

CUDA

Dec, 12

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems

The increasing scale and wealth of inter-connected data, such as those accrued by social network applications, demand the design of new techniques and platforms to efficiently derive actionable knowledge from large-scale graphs. However, real-world graphs are famously difficult to process efficiently. Not only they have a large memory footprint, but also most graph algorithms entail […]

CUDA

Dec, 11

Multilayered Abstractions for Partial Differential Equations

How do we build maintainable, robust, and performance-portable scientific applications? This thesis argues that the answer to this software engineering question in the context of the finite element method is through the use of layers of Domain-Specific Languages (DSLs) to separate the various concerns in the engineering of such codes. Performance-portable software achieves high performance […]

CUDA

•

OpenCL

Dec, 11

Job Parallelism using Graphical Processing Unit Individual Multi-Processors and Localised Memory

Graphical Processing Units(GPUs) are usually programmed to provide data-parallel acceleration to a host processor. Modern GPUs typically have an internal multi-processor (MP) structure that can be exploited in an unusual way to offer semi-independent task parallelism providing the MPs can operate within their own localised memory and apply data-parallelism to their own problem subset. We […]

CUDA

Dec, 11

Runtime Support toward Transparent Memory Access in GPU-accelerated Heterogeneous Systems

GPU has become a popular parallel accelerator in modern heterogeneous systems for its great parallelism and superior energy efficiency. However, it also extremely complicates programing the memory system in such heterogeneous systems, due to the non-continuous memory spaces on CPU and GPU, and a two-level memory hierarchy on a GPU itself. The complexity of this […]

Dec, 11

A New Software Based GPU Framework

A software based GPU design, where most of the 3D pipeline is executed in software on shaders, with minimal support from custom hardware blocks, provides three benefits, it: (1) simplifies the GPU design, (2) turns 3D graphics into a general purpose application, and (3) opens the door for applying compiler optimization to the whole 3D […]

CUDA

Dec, 11

A GPU-Accelerated Framework for Image Processing and Computer Vision

This paper presents and briefly describes the state of the art of accelerating image processing with graphics hardware (GPU) and discusses some of its caveats. Then it describes GpuCV, an open source multiplatform library for GPU-accelerated image processing and Computer Vision operators and applications. It is meant for computer vision scientist not familiar with GPU […]

CUDA

Dec, 11

High Performance Poisson Equation Solver for Hybrid CPU/GPU Systems

We investigated the possible way for treatment of electrostatic interactions by solving numerically Poisson’s equation using Conjugate Gradient method and Stabilized BiConjugate Gradient method. The aim of the research was to test the execution time of prototype programs running on BLueGene/P and CPU/GPU system. The results show that the tested methods are applicable for electrostatics […]

Dec, 11

GPU Accelerated Parallel Iris Localization

Iris recognition is quite a computation intensive task with huge amounts of pixel processing. After the image acquisition of the eye, Iris recognition is basically divided into Iris localization, Feature Extraction and Matching steps. Each of these tasks involves a lot of processing. It thus becomes essential to improve the performance of each step to […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

GPU Based Dose Calculation

Development of Bayesian analysis program for extraction of polarisation observables at CLAS

Inter-block synchronization on a GPGPU

Lessons learned from contrasting a BLAS kernel implementations

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems

Multilayered Abstractions for Partial Differential Equations

Job Parallelism using Graphical Processing Unit Individual Multi-Processors and Localised Memory

Runtime Support toward Transparent Memory Access in GPU-accelerated Heterogeneous Systems

A New Software Based GPU Framework

A GPU-Accelerated Framework for Image Processing and Computer Vision

High Performance Poisson Equation Solver for Hybrid CPU/GPU Systems

GPU Accelerated Parallel Iris Localization

Recent source codes

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

Most viewed papers (last 30 days)