Posts
Nov, 3
A Mutable Hardware Abstraction to Replace Threads
Ever since first digital images appeared, computer scientists all over the world have been trying to computationally estimate their similarity. So far, no solution as good as human brain was found. This paper presents another technique that tackles with this issue, using singular value decomposition – a matrix factorization method which extracts main features of […]
Nov, 3
Parallelization of the Generalized Hough Transform on GPU
Programs developed under the Compute Unified Device Architecture (CUDA) obtain the highest performance rate, when the exploitation of hardware resources on a Graphics Processing Unit (GPU) is maximized. In order to achieve this purpose, load balancing among threads and a high value of processor occupancy, i.e. the ratio of active threads, are indispensable. However, in […]
Nov, 3
True 4D Image Denoising on the GPU
The use of image denoising techniques is an important part of many medical imaging applications. One common application is to improve the image quality of low-dose (noisy) computed tomography (CT) data. While 3D image denoising previously has been applied to several volumes independently, there has not been much work done on true 4D image denoising, […]
Nov, 3
Performance Portability of a GPU Enabled Factorization with the DAGuE Framework
Performance portability is a major challenge faced today by developers on heterogeneous high performance computers, consisting of an interconnect, memory with nonuniform access, many-cores and accelerators like GPUs. Recent studies have successfully demonstrated that dense linear algebra operations can be efficiently handled by runtime systems using a DAG representation. In this work, we present the […]
Nov, 3
GrIP: A Framework for Experiments with Screen Space Algorithms
We present the extensible post processing framework GrIP, usable for experimenting with screen space-based graphics algorithms in arbitrary applications. The user can easily implement new ideas as well as add known operators as components to existing ones. Through a well-defined interface, operators are realized as plugins that are loaded at run-time. Operators can be combined […]
Nov, 3
Applicability of GPU Computing for Efficient Merge in In-Memory Databases
Column oriented in-memory databases typically use dictionary compression to reduce the overall storage space and allow fast lookup and comparison. However, there is a high performance cost for updates since the dictionary, used for compression, has to be recreated each time records are created, updated or deleted. This has to be taken into account for […]
Nov, 3
Accelerating Multi-Scale Flows for LDDKBM Diffeomorphic Registration
Registrations in medical imaging and computational anatomy can be obtained using the Large Deformation Diffeomorphic Kernel Bundle Mapping (LDDKBM) framework. This provides a registration algorithm with a solid mathematical foundation while incorporating regularization of deformation at multiple scales. Because the variational formulation of LDDKBM implies a heavy computational burden in the search for optimal registrations, […]
Nov, 3
Topology Optimization with Unstructured Meshes on Graphics Processing Units (GPUs)
The present work investigates the feasibility of nite element methods and topology optimization for unstructured meshes in massively parallel computer architectures, more speci cally on Graphics Processing Units or GPUs. Algorithms for every step in these methods are proposed and benchmarked with varied results. The ultimate goal of this work is to speed up the […]
Nov, 3
Efficient Quicksort and 2D Convex Hull for CUDA, and MSIMD as a Realistic Model of Massively Parallel Computations
In recent years CUDA has become a major architecture for multithreaded computations. Unfortunately, its potential is not yet being commonly utilized because many fundamental problems have no practical solutions for such machines. Our goal is to establish a hybrid multicore/parallel theoretical model that represents well architectures like NVIDIA CUDA, Intel Larabee, and OpenCL as well […]
Nov, 3
Colour flux-tubes in static Pentaquark and Tetraquark systems
The colour fields created by the static tetraquark and pentaquark systems are computed in quenched SU(3) lattice QCD, with gauge invariant lattice operators, in a 24^3 x 48 lattice at beta=6.2. We generate our quenched configurations with GPUs, and detail the respective benchmanrks in different SU(N) groups. While at smaller distances the coulomb potential is […]
Nov, 2
A Comparison of Many-threaded Differential Evolution and Genetic Algorithms on CUDA
The recent time has seen the rise of consumer grade massively parallel environments. Powerful GPUs and multi-core processors became widely available and easy to use programming APIs such as nVidia CUDA, OpenCL, and DirectCompute simplify the development of applications that can utilize them. In this environment, the nature inspired metaheuristics can be in suitable cases […]
Nov, 2
Multi-view Rendering Approach for Cloud-based Gaming Services
In order to render hundreds or thousands of views for multi-user games on a cloud-based gaming at interactive rates, we need a solution which is both scalable and efficient.We present a new cloud-based gaming service system which supports multiple viewpoint rendering for visualizing a 3D game scene dataset at the same time for the multi-user […]