Posts
Jan, 26
Efficient Parallel Implementation of Active Appearance Model Fitting Algorithm on GPU
The Active Appearance Model (AAM) is one of the most powerful model-based object detecting and tracking methods that has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern Graphics Processing Units (GPUs) that feature a […]
Jan, 26
GPU acceleration of Newton’s method for large systems of polynomial equations in double double and quad double arithmetic
In order to compensate for the higher cost of double double and quad double arithmetic when solving large polynomial systems, we investigate the application of NVIDIA Tesla C2050, K20C, and K40 general purpose graphics processing units. As the dimension equals several thousands, the cost to compute one QR decomposition is sufficiently large so that the […]
Jan, 26
GPU Monte Carlo scatter calculations for Cone Beam Computed Tomography
A GPU Monte Carlo code for x-ray photon transport has been implemented and extensively tested. The code is intended for scatter compensation of cone beam computed tomography images. The code was tested to agree with other well known codes within 5% for a set of simple scenarios. The scatter compensation was also tested using an […]
Jan, 25
A High-productivity Framework for Multi-GPU computation of Mesh-based applications
The paper proposes a high-productivity framework for multi-GPU computation of mesh-based applications. In order to achieve high performance on these applications, we have to introduce complicated optimized techniques for GPU computing, which requires relatively-high cost of implementation. Our framework automatically translates user-written functions that update a grid point and generates both GPU and CPU code. […]
Jan, 25
Accelerating a Bayesian Phylogenetic Inference Application with OpenACC
The need for faster computing has been around ever since the birth of the first computers. Faster hardware will almost always guarantee faster computing but occasionally the rate of hardware development is not enough for some programs to deal with the vast information they need. When these programs need to be accelerated, algorithmic optimizations have […]
Jan, 25
Performance Evaluation of the Intel Many Integrated Core Architecture for 3D Image Reconstruction in Computed Tomography
The computational effort of 3D image reconstruction in Computed Tomography (CT) has required special purpose hardware for a long time. Systems such as custom-built FPGA-systems and GPUs are still widely-used today, in particular in interventional settings, where radiologists require a hard time constraint for reconstruction. However, recently is has been shown that today even commodity […]
Jan, 25
Improvement of the fused CUDA kernels performance prediction
In this thesis a tool for improving the performance prediction of a source-to-source compiler of mapped functions developed on the Faculty of Informatics is presented. This tool integrates the modification of the original compiler and static and dynamic data gathering to provide as much data about the fusions as possible in order to analyze them. […]
Jan, 25
Finite differences numerical method for two-dimensional superlattice Boltzmann transport equation and case comparison of CPU(C) and GPGPU(CUDA) implementations
We present finite differences numerical algorithm for solving 2D spatially homogeneous Boltzmann transport equation for semiconductor superlattices (SL) subject to time dependant electric field along SL axis and constant perpendicular magnetic field. Algorithm is implemented in C language targeted to CPU and in CUDA C language targeted to commodity NVidia GPUs. We compare performance and […]
Jan, 23
OpenSSL acceleration using Graphics Processing Units
Cryptography: The study of techniques focused on security. Typically, an implementation of cryptography is computationally heavy, leading to performance issues on general purpose systems. Adding the possibility of offloading cryptographic operations to a Graphics Processing Unit (GPU) onto a widespread, open-source cryptographic library such as OpenSSL would be extremely useful in lightening the CPU load […]
Jan, 23
On the Portability of the OpenCL Dwarfs on Fixed and Reconfigurable Parallel Platforms
The proliferation of heterogeneous computing systems presents the parallel computing community with the challenge of porting legacy and emerging applications to multiple processors with diverse programming abstractions. OpenCL is a vendor-agnostic and industry-supported programming model that offers code portability on heterogeneous platforms, allowing applications to be developed once and deployed "anywhere". In this paper, we […]
Jan, 23
clpeak – peak performance of your opencl device
clpeak is a benchmarking tool intended toward developers to fine-tune opencl kernels for a particular device/class of device. It calculates bandwidth & compute performance for different vector-widths of a datatype, say float, float4. Traditionally it is recommended to use scalar code and we expect opencl compiler to auto-vectorize it. But, most of the times compiler […]
Jan, 23
A comparison between parallelization approaches in molecular dynamics simulations on GPUs
We test the performances of two different approaches to the computation of forces for molecular dynamics simulations on Graphics Processing Units. A "vertex-based" approach, where a computing thread is started per particle, is compared to a newly proposed "edge-based" approach, where a thread is started per each potentially non-zero interaction. We find that the former […]