Posts
Nov, 4
VASP on a GPU: application to exact-exchange calculations of the stability of elemental boron
General purpose graphical processing units (GPU’s) offer high processing speeds for certain classes of highly parallelizable computations, such as matrix operations and Fourier transforms, that lie at the heart of first-principles electronic structure calculations. Inclusion of exact-exchange increases the cost of density functional theory by orders of magnitude, motivating the use of GPU’s. Porting the […]
Nov, 4
Computing Optimal Cycle Mean in Parallel on CUDA
Computation of optimal cycle mean in a directed weighted graph has many applications in program analysis, performance verification in particular. In this paper we propose a data-parallel algorithmic solution to the problem and show how the computation of optimal cycle mean can be efficiently accelerated by means of CUDA technology. We show how the problem […]
Nov, 3
A Mutable Hardware Abstraction to Replace Threads
Ever since first digital images appeared, computer scientists all over the world have been trying to computationally estimate their similarity. So far, no solution as good as human brain was found. This paper presents another technique that tackles with this issue, using singular value decomposition – a matrix factorization method which extracts main features of […]
Nov, 3
Parallelization of the Generalized Hough Transform on GPU
Programs developed under the Compute Unified Device Architecture (CUDA) obtain the highest performance rate, when the exploitation of hardware resources on a Graphics Processing Unit (GPU) is maximized. In order to achieve this purpose, load balancing among threads and a high value of processor occupancy, i.e. the ratio of active threads, are indispensable. However, in […]
Nov, 3
True 4D Image Denoising on the GPU
The use of image denoising techniques is an important part of many medical imaging applications. One common application is to improve the image quality of low-dose (noisy) computed tomography (CT) data. While 3D image denoising previously has been applied to several volumes independently, there has not been much work done on true 4D image denoising, […]
Nov, 3
Performance Portability of a GPU Enabled Factorization with the DAGuE Framework
Performance portability is a major challenge faced today by developers on heterogeneous high performance computers, consisting of an interconnect, memory with nonuniform access, many-cores and accelerators like GPUs. Recent studies have successfully demonstrated that dense linear algebra operations can be efficiently handled by runtime systems using a DAG representation. In this work, we present the […]
Nov, 3
GrIP: A Framework for Experiments with Screen Space Algorithms
We present the extensible post processing framework GrIP, usable for experimenting with screen space-based graphics algorithms in arbitrary applications. The user can easily implement new ideas as well as add known operators as components to existing ones. Through a well-defined interface, operators are realized as plugins that are loaded at run-time. Operators can be combined […]
Nov, 3
Applicability of GPU Computing for Efficient Merge in In-Memory Databases
Column oriented in-memory databases typically use dictionary compression to reduce the overall storage space and allow fast lookup and comparison. However, there is a high performance cost for updates since the dictionary, used for compression, has to be recreated each time records are created, updated or deleted. This has to be taken into account for […]
Nov, 3
Accelerating Multi-Scale Flows for LDDKBM Diffeomorphic Registration
Registrations in medical imaging and computational anatomy can be obtained using the Large Deformation Diffeomorphic Kernel Bundle Mapping (LDDKBM) framework. This provides a registration algorithm with a solid mathematical foundation while incorporating regularization of deformation at multiple scales. Because the variational formulation of LDDKBM implies a heavy computational burden in the search for optimal registrations, […]
Nov, 3
Topology Optimization with Unstructured Meshes on Graphics Processing Units (GPUs)
The present work investigates the feasibility of nite element methods and topology optimization for unstructured meshes in massively parallel computer architectures, more speci cally on Graphics Processing Units or GPUs. Algorithms for every step in these methods are proposed and benchmarked with varied results. The ultimate goal of this work is to speed up the […]
Nov, 3
Efficient Quicksort and 2D Convex Hull for CUDA, and MSIMD as a Realistic Model of Massively Parallel Computations
In recent years CUDA has become a major architecture for multithreaded computations. Unfortunately, its potential is not yet being commonly utilized because many fundamental problems have no practical solutions for such machines. Our goal is to establish a hybrid multicore/parallel theoretical model that represents well architectures like NVIDIA CUDA, Intel Larabee, and OpenCL as well […]
Nov, 3
Colour flux-tubes in static Pentaquark and Tetraquark systems
The colour fields created by the static tetraquark and pentaquark systems are computed in quenched SU(3) lattice QCD, with gauge invariant lattice operators, in a 24^3 x 48 lattice at beta=6.2. We generate our quenched configurations with GPUs, and detail the respective benchmanrks in different SU(N) groups. While at smaller distances the coulomb potential is […]

