Posts
Sep, 23
Task Performance with List-Mode Data
This dissertation investigates the application of list-mode data to detection, estimation, and image reconstruction problems, with an emphasis on emission tomography in medical imaging. We begin by introducing a theoretical framework for list-mode data and we use it to define two observers that operate on list-mode data. These observers are applied to the problem of […]
Sep, 23
Computer Vision Application in Graphic Processors
Largely driven by the gaming industry, research and development of hardware tools for the generation of images, such as graphics cards (or GPU, Graphics Processing Units), experienced a tremendous growth in recent years. The increased power and flexibility and the low price of these GPUs have resulted in unexpected use in areas other than graphics. […]
Sep, 23
A Quantitative Study of Irregular Programs on GPUs
GPUs have been used to accelerate many regular applications and, more recently, irregular applications in which the control flow and memory access patterns are data-dependent and statically unpredictable. This paper defines two measures of irregularity called control-flow irregularity and memory-access irregularity, and investigates, using performance-counter measurements, how irregular GPU kernels differ from regular kernels with […]
Sep, 22
Computing of high breakdown regression estimators without sorting on graphics processing units
We present an approach to computing high-breakdown regression estimators in parallel on graphics processing units (GPU). We show that sorting the residuals is not necessary, and it can be substituted by calculating the median. We present and compare various methods to calculate the median and order statistics on GPUs. We introduce an alternative method based […]
Sep, 22
SpMV: A Memory-Bound Application on the GPU Stuck Between a Rock and a Hard Place
In this paper, we investigate the relative merits between GPGPUs and multicores in the context of sparse matrix-vector multiplication (SpMV). While GPGPUs possess impressive capabilities in terms of raw compute throughput and memory bandwidth, their performance varies significantly with application tuning as well as sparse input and format characteristics. Furthermore, several emerging technological and workload […]
Sep, 22
Overlapping computation and communication of three-dimensional FDTD on a GPU cluster
Large-scale electromagnetic field simulations using the FDTD (finite-difference time-domain) method require the use of GPU (graphics processing unit) clusters. However, the communication overhead caused by slow interconnections becomes a major performance bottleneck. In this paper, as a way to remove the bottleneck, we propose the "kernel-split method" and the "host-buffer method" which overlap computation and […]
Sep, 22
Exploration of Parallelization Frameworks for Computational Finance
This paper presents a comparison of parallelization frameworks for efficient execution of computational finance workloads. We use a Value-at-Risk (VaR) workload to evaluate OpenCL and OpenMP parallelization frameworks on multi-core CPUs as opposed to GPUs. In addition, we study the impact of SMT on performance using GCC (4.4) and IBM XLC (11.01) compilers for both […]
Sep, 22
Modification of self-organizing migration algorithm for OpenCL framework
This paper deals with modification of self-organizing migration algorithm using the OpenCL framework. This modification allows the algorithm to exploit modern parallel devices, like central processing units and graphics processing units. The main aim was to create algorithm which shows significant speedup when compared to sequential variant. Second aim was to create the algorithm robust […]
Sep, 21
Large-Scale Motion Modelling using a Graphical Processing Unit
The increased availability of Graphical Processing Units (GPUs) in personal computers has made parallel programming worthwhile and more accessible, but not necessarily easier. This thesis will take advantage of the power of a GPU, in conjunction with the Central Processing Unit (CPU), in order to simulate target trajectories for large-scale scenarios, such as wide-area maritime […]
Sep, 21
Some examples of instant computations of fluid dynamics on GPU
This paper is a summary of our experience feedback on GPU and GPGPU computing for two-dimensional computational fluid dynamics using fine grids and three-dimensional kinetic transport problems. The choice of the computational approach is clearly critical for both performance speedup and efficiency. In our numerical experiments, we used a Lattice Boltzmann approach (LBM) for the […]
Sep, 21
Parallelization of Hierarchical Text Clustering on Multi-core CUDA Architecture
Text Clustering is the problem of dividing text documents into groups, such that documents in same group are similar to one another and different from documents in other groups. Because of the general tendency of texts forming hierarchies, text clustering is best performed by using a hierarchical clustering method. An important aspect while clustering large […]
Sep, 21
Fast and Efficient Automatic Memory Management for GPUs using Compiler-Assisted Runtime Coherence Scheme
Exploiting the performance potential of GPUs requires managing the data transfers to and from them efficiently which is an errorprone and tedious task. In this paper, we develop a software coherence mechanism to fully automate all data transfers between the CPU and GPU without any assistance from the programmer. Our mechanism uses compiler analysis to […]