Posts
Dec, 15
Real time Multi-GPU-based Event Detection in High Definition Videos
Video processing algorithms present a very important tool for many applications related to computer vision domain such as motion tracking, videos indexation, robot navigation and event detection. However, the new video standards, especially in high definitions, cause that the current implementations, even running on modern hardware, no longer respect the needs of real-time processing. In […]
Dec, 15
OpenCL-Accelerated Computation of a 3D SPECT Projection Operator for the Content Adaptive Mesh Model
In this manuscript, we present a preliminary evaluation of a fully 3D projection operator calculation aimed at emission tomography on a non-circular orbit. The proposed methodology uses the content-adaptive mesh model (CAMM) for volumetric data representation. The CAMM is an efficient data representation based on adaptive non-uniform sampling and linear interpolation. The presented projection operator […]
Dec, 13
Data Transfer Matters for GPU Computing
Graphics processing units (GPUs) embrace manycore compute devices where massively parallel compute threads are offloaded from CPUs. This heterogeneous nature of GPU computing raises non-trivial data transfer problems especially against latency-critical real-time systems. However even the basic characteristics of data transfers associated with GPU computing are not well studied in the literature. In this paper, […]
Dec, 13
GPU hardware acceleration for industrial applications: using computation to push beyond physical limitations
This thesis explores the possibility of utilizing Graphics Processing Units (GPUs) to address the computational demand of algorithms used to mitigate the inherent physical limitations in devices such as microscopes and 3D-scanners. We investigate the outcome and test our methodology for the following case studies: – the narrow field of view found in microscopes. – […]
Dec, 13
All-pairs Shortest Path Algorithm based on MPI+CUDA Distributed Parallel Programming Model
In view of the problem that computing shortest paths in a graph is a complex and time-consuming process, and the traditional algorithm that rely on the CPU as computing unit solely can’t meet the demand of real-time processing, in this paper, we present an all-pairs shortest paths algorithm using MPI+CUDA hybrid programming model, which can […]
Dec, 13
TuCCompi: A Multi-Layer Programing Model for Heterogeneous Systems with Auto-Tuning Capabilities
During the last decade, parallel processor architectures have become a powerful tool to deal with massively-parallel problems that require High Performance Computing (HPC). The last trend of HPC is the use of heterogeneous environments, that combine different computational power units, such as CPU-cores and GPUs. Performance maximization of any GPU parallel implementation of an algorithm […]
Dec, 13
Augur: a Modeling Language for Data-Parallel Probabilistic Inference
It is time-consuming and error-prone to implement inference procedures for each new probabilistic model. Probabilistic programming addresses this problem by allowing a user to specify the model and having a compiler automatically generate an inference procedure for it. For this approach to be practical, it is important to generate inference code that has reasonable performance. […]
Dec, 12
GPU Based Dose Calculation
The goal of this dissertation was to parallelize a dose calculation code for radiotherapy cancer treatment and explore the suitability of the new Intel Xeon Phi technology for such task. The source code proved to have many bugs and as such it took a long time to be able to produce consistent results. Thus, the […]
Dec, 12
Development of Bayesian analysis program for extraction of polarisation observables at CLAS
At the mass of a proton, the strong force is not well understood. Various quark models exist, but it is important to determine which quark model(s) are most accurate. Experimentally, finding resonances predicted by some models and not others would give valuable insight into this fundamental interaction. Several labs around the world use photoproduction experiments […]
Dec, 12
Inter-block synchronization on a GPGPU
With the invention of multi-core processing unit technology, the graphics processing unit has evolved from single core graphic processing unit to multi-core programmable graphics processing units. Because of the GPUs’ architecture, people found that it is not only good at processing graphics related data, but also suitable for performing general purpose parallel computations. However, since […]
Dec, 12
Lessons learned from contrasting a BLAS kernel implementations
This work reviews the experience of implementing different versions of the SSPR rank-one update operation of the BLAS library. The main objective was to contrast CPU versus GPU implementation effort and complexity of an optimized BLAS routine, not considering performance. This work contributes with a sample procedure to compare BLAS kernel implementations, how to start […]
Dec, 12
Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems
The increasing scale and wealth of inter-connected data, such as those accrued by social network applications, demand the design of new techniques and platforms to efficiently derive actionable knowledge from large-scale graphs. However, real-world graphs are famously difficult to process efficiently. Not only they have a large memory footprint, but also most graph algorithms entail […]