Posts
Nov, 29
Applications Performance on GPGPUs with the Fermi Architecture
The latest GPU architecture released by Nvidia, code-named "Fermi", is the most advanced computing GPU architecture ever built. Radical changes took place on the GPU computing architecture compared to Fermi’s predecessors such as the GT200 series and the G80s. In this dissertation the Fermi architecture is analysed, addressing the most prominent upgrades, by running extensive […]
Nov, 29
Dynamic Task Parallelism with a GPU Work-Stealing Runtime System
NVIDIA’s Compute Unified Device Architecture (CUDA) and its attached C/C++ based API went a long way towards making GPUs more accessible to mainstream programming. So far, the use of GPUs for high performance computing has been primarily restricted to data parallel applications, and with good reason. The high number of computational cores and high memory […]
Nov, 29
Directives Based Programming of GPU Accelerated Systems
Graphics Processing Units (GPUs) are commodity chips primarily used as coprocessors for processing high definition graphics on a computer system. It possess faster processing power and efficiency in handling accurate single and double floating point numbers with less power consumption compared to CPUs. Realising its potential in general purpose computing manufacturers of these chips have […]
Nov, 29
Enabling Efficient Online Profiling of Homogeneous and Heterogeneous Multicore Systems
Using profiling tools is a common way to understand computer systems and software and to achieve the best performance. Profiling becomes more important as computing technology advances and makes it more difficult to intuitively reason about system characteristics. However, the recent shift in computing technology to multicore systems and heterogeneous systems requires new profiling methods […]
Nov, 29
GPGPU Volume Classification using SimpleOpenCL
In volume visualization, the definition of the regions of interest is inherently an iterative trialand-error process finding out the best parameters to classify and render the final image. In this work, we present a general framework for training multi-class classifiers using Error-Correcting Output Codes. Moreover, we propose a GPGPU parallelization system using SimpleOpenCL, an OpenSource […]
Nov, 29
Solving Dense Generalized Eigenproblems on Multi-threaded Architectures
We compare two approaches to compute a portion of the spectrum of dense symmetric definite generalized eigenproblems: one is based on the reduction to tridiagonal form, and the other on the Krylov-subspace iteration. Two large-scale applications, arising in molecular dynamics and material science, are employed to investigate the contributions of the application, architecture, and parallelism […]
Nov, 29
Electric polarizability of hadrons with overlap fermions on multi-GPUs
Electric polarizability is an important parameter for the internal structure of hadrons. Previous studies of polarizabilities have been done at relatively heavy pion masses, leaving the chiral region largely unexplored. In this report, we use overlap fermions which are known to be computationally demanding to properly capture the chiral dynamics. We present an implementation strategy […]
Nov, 29
A GPU-based survey for millisecond radio transients using ARTEMIS
Astrophysical radio transients are excellent probes of extreme physical processes originating from compact sources within our Galaxy and beyond. Radio frequency signals emitted from these objects provide a means to study the intervening medium through which they travel. Next generation radio telescopes are designed to explore the vast unexplored parameter space of high time resolution […]
Nov, 28
On the numerical sensitivity of computer simulations on hybrid and parallel computing systems
Simulation results depend not only on the precision of the floating point arithmetic with respect to the numerical accuracy of the results. They are also sensitive to differences of floating point arithmetic implementations of different hybrid and parallel computing systems such as CPUs, GPUs, dedicated processors like the Cell processor or the GRAPE special-purpose computer […]
Nov, 28
Accelerating the Hough Transform with CUDA on Graphics Processing Units
Circle detection has been widely applied in image processing applications. Hough transform, the most popular method of shape detection, normally takes a long time to achieve reasonable results, especially for large images. Such performance makes it almost impossible to conduct real-time image processing with sequential algorithms on community computers. Recently, NVIDIA developed CUDA programming paradigm […]
Nov, 28
Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards
We describe and evaluate a fast implementation of a classical block-matching motion estimation algorithm for multiple graphical processing units (GPUs) using the compute unified device architecture computing engine. The implemented block-matching algorithm uses summed absolute difference error criterion and full grid search (FS) for finding optimal block displacement. In this evaluation, we compared the execution […]
Nov, 28
Anytime Algorithms for GPU Architectures
Most algorithms are run-to-completion and provide one answer upon completion and no answer if interrupted before completion. On the other hand, anytime algorithms have a monotonic increasing utility with the length of execution time. Our investigation focuses on the development of time-bounded anytime algorithms on Graphics Processing Units (GPUs) to trade-off the quality of output […]