high performance computing on graphics processing units: hgpu.org

Posts

Nov, 4

Semi-Global Matching-Motivation, Developments and Applications

Since its original publication, the Semi-Global Matching (SGM) technique has been re-implemented by many researchers and companies. The method offers a very good trade off between runtime and accuracy, especially at object borders and fine structures. It is also robust against radiometric differences and not sensitive to the choice of parameters. Therefore, it is well […]

OpenCL

•

OpenGL

Nov, 4

Inter-cluster communication on clustered SIMD architectures

This work envisions that in the near future, GPUlike architectures will find their way to embedded systems. Accompanied by a small RISC control core, they will not merely be a hardware accelerator, but the heart of the system itself. Taking a state-of-the-art GPU, a baseline architecture is constructed with the embedded context in mind. Next, […]

Nov, 4

VASP on a GPU: application to exact-exchange calculations of the stability of elemental boron

General purpose graphical processing units (GPU’s) offer high processing speeds for certain classes of highly parallelizable computations, such as matrix operations and Fourier transforms, that lie at the heart of first-principles electronic structure calculations. Inclusion of exact-exchange increases the cost of density functional theory by orders of magnitude, motivating the use of GPU’s. Porting the […]

CUDA

Nov, 4

Computing Optimal Cycle Mean in Parallel on CUDA

Computation of optimal cycle mean in a directed weighted graph has many applications in program analysis, performance verification in particular. In this paper we propose a data-parallel algorithmic solution to the problem and show how the computation of optimal cycle mean can be efficiently accelerated by means of CUDA technology. We show how the problem […]

CUDA

Nov, 3

A Mutable Hardware Abstraction to Replace Threads

Ever since first digital images appeared, computer scientists all over the world have been trying to computationally estimate their similarity. So far, no solution as good as human brain was found. This paper presents another technique that tackles with this issue, using singular value decomposition – a matrix factorization method which extracts main features of […]

Nov, 3

Parallelization of the Generalized Hough Transform on GPU

Programs developed under the Compute Unified Device Architecture (CUDA) obtain the highest performance rate, when the exploitation of hardware resources on a Graphics Processing Unit (GPU) is maximized. In order to achieve this purpose, load balancing among threads and a high value of processor occupancy, i.e. the ratio of active threads, are indispensable. However, in […]

CUDA

Nov, 3

True 4D Image Denoising on the GPU

The use of image denoising techniques is an important part of many medical imaging applications. One common application is to improve the image quality of low-dose (noisy) computed tomography (CT) data. While 3D image denoising previously has been applied to several volumes independently, there has not been much work done on true 4D image denoising, […]

CUDA

Nov, 3

Performance Portability of a GPU Enabled Factorization with the DAGuE Framework

Performance portability is a major challenge faced today by developers on heterogeneous high performance computers, consisting of an interconnect, memory with nonuniform access, many-cores and accelerators like GPUs. Recent studies have successfully demonstrated that dense linear algebra operations can be efficiently handled by runtime systems using a DAG representation. In this work, we present the […]

Nov, 3

GrIP: A Framework for Experiments with Screen Space Algorithms

We present the extensible post processing framework GrIP, usable for experimenting with screen space-based graphics algorithms in arbitrary applications. The user can easily implement new ideas as well as add known operators as components to existing ones. Through a well-defined interface, operators are realized as plugins that are loaded at run-time. Operators can be combined […]

CUDA

Nov, 3

Applicability of GPU Computing for Efficient Merge in In-Memory Databases

Column oriented in-memory databases typically use dictionary compression to reduce the overall storage space and allow fast lookup and comparison. However, there is a high performance cost for updates since the dictionary, used for compression, has to be recreated each time records are created, updated or deleted. This has to be taken into account for […]

CUDA

Nov, 3

Accelerating Multi-Scale Flows for LDDKBM Diffeomorphic Registration

Registrations in medical imaging and computational anatomy can be obtained using the Large Deformation Diffeomorphic Kernel Bundle Mapping (LDDKBM) framework. This provides a registration algorithm with a solid mathematical foundation while incorporating regularization of deformation at multiple scales. Because the variational formulation of LDDKBM implies a heavy computational burden in the search for optimal registrations, […]

CUDA

•

OpenCL

Nov, 3

Topology Optimization with Unstructured Meshes on Graphics Processing Units (GPUs)

The present work investigates the feasibility of nite element methods and topology optimization for unstructured meshes in massively parallel computer architectures, more speci cally on Graphics Processing Units or GPUs. Algorithms for every step in these methods are proposed and benchmarked with varied results. The ultimate goal of this work is to speed up the […]

CUDA