Posts
Jul, 30
Image Processing with CUDA
This thesis puts to the test the power of parallel computing on the GPU against the massive computations needed in image processing of large images. The GPU has long been used to accelerate 3D applications. With the advent of high level programmable interfaces, programming to the GPU is simplified and is being used to accelerate […]
Jul, 30
Domain Specific Languages for High Performance Computing
High Performance Computing (HPC) relies completely on complex parallel, heterogeneous architectures and distributed systems which are hard and error-prone to exploit, even for HPC specialists. Further and further knowledge on runtime systems, dependency tracking, memory transaction optimization and many other techniques are a must-have requirement to produce high quality software capable of exploiting every single […]
Jul, 30
Counting and Occurrence Sort for GPUs using an Embedded Language
This paper investigates two sorting algorithms: counting sort and a variation, occurrence sort, which also removes duplicate elements, and examines their suitability for running on the GPU. The duplicate removing variation turns out to have a natural functional, dataparallel implementation which makes it particularly interesting for GPUs. The algorithms are implemented in Obsidian, a high-level […]
Jul, 30
Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi
This paper presents the design and implementation of several fundamental dense linear algebra (DLA) algorithms for multicore with Intel Xeon Phi Coprocessors. In particular, we consider algorithms for solving linear systems. Further, we give an overview of the MAGMA MIC library, an open source, high performance library that incorporates the developments presented, and in general […]
Jul, 30
Scalable Dense Linear Algebra on Heterogeneous Hardware
Design of systems exceeding 1 Pflop/s and the push toward 1 Eflop/s, forced a dramatic shift in hardware design. Various physical and engineering constraints resulted in introduction of massive parallelism and functional hybridization with the use of accelerator units. This paradigm change brings about a serious challenge for application developers, as the management of multicore […]
Jul, 29
LU Factorization with Partial Pivoting for a Multicore System with Accelerators
LU factorization with partial pivoting is a canonical numerical procedure and the main component of the high performance LINPACK benchmark. This paper presents an implementation of the algorithm for a hybrid, shared memory, system with standard CPU cores and GPU accelerators. The difficulty of implementing the algorithm for such a system lies in the disproportion […]
Jul, 29
Progressive High-Quality Response Surfaces for Visually Guided Sensitivity Analysis
In this paper we present a technique which allows us to perform high quality and progressive response surface prediction from multidimensional input samples in an efficient manner. We utilize kriging interpolation to estimate a response surface which minimizes the expectation value and variance of the prediction error. High computational efficiency is achieved by employing parallel […]
Jul, 29
Real-time 3D semi-local surface patch extraction using GPGPU
Feature vectors can be anything from simple surface normals to more complex feature descriptors. Feature extraction is important in order to solve various computer vision problems: e.g. registration, object recognition and scene understanding. Most of these techniques cannot be computed online due to their complexity and the context where they are applied. Therefore computing these […]
Jul, 29
Harnessing the GPU for Real-Time Haptic Tissue Simulation
Virtual surgery simulators are emerging as a training method for medical specialists and are expected to provide a virtual environment that is realistic and responsive enough to be able to physically simulate a wide variety of medical scenarios. Haptic interaction with the environment requires an underlying physical model that is dynamic, deformable, and computable in […]
Jul, 29
Point Based Approximate Color Bleeding With Cuda
Simulating light is a very computationally expensive proposition. There are a wide variety of global illumination algorithms that are implemented and used by major motion picture companies to render interesting and believable scenes. Every algorithm strives to find a balance between speed and accuracy. The Point Based Approximate Color Bleeding algorithm is one of the […]
Jul, 27
Graph-Based Substructure Pattern Mining Using CUDA Dynamic Parallelism
CUDA is an advanced massively parallel computing platform that can provide high performance computing power at much more affordable cost. In this paper, we present a parallel graph-based substructure pattern mining algorithm using CUDA Dynamic Parallelism. The key contribution is a parallel solution to traversing the DFS (Depth First Search) code tree. Furthermore, we implement […]
Jul, 27
Systematic Performance Optimization of Cone-Beam Back-Projection on the Kepler Architecture
Filtered back-projection algorithms are widely used for the reconstruction of volumetric data from cone-beam projections in interventional C-arm computed tomography. Furthermore, general-purpose GPUs have become a popular tool for accelerating the reconstruction during time-critical clinical procedures. In this work, we focus on the systematic performance optimization of cone-beam back-projection on the latest architecture of CUDA-enabled […]