Posts
Jan, 26
Computing Best Possible Pseudo-Solutions to Interval Linear Systems of Equations
In the paper, we consider interval linear algebraic systems of equations Ax = b, with an interval matrix A and interval right-hand side vector b, as a model of imprecise systems of linear algebraic equations of the same form. We propose a new regularization procedure that reduces the solution of the imprecise linear system to […]
Jan, 26
Low-latency Image Recognition with GPU-accelerated Convolutional Networks for Web-based Services
In this work, we describe an application of convolutional networks to object classification and detection in images. The task of image based object recognition is surveyed in the first chapter. Its application in internet advertisement is one of the main motivations of this work. The architecture of the convolutional networks is described in details in […]
Jan, 26
Optimizing Stencil Computations for NVIDIA Kepler GPUs
We present a series of optimization techniques for stencil computations on NVIDIA Kepler GPUs. Stencil computations with regular grids had been ported to the older generations of NVIDIA GPUs with significant performance improvements thanks to the higher memory bandwidth than conventional CPU-only systems. However, because of the architectural changes introduced with the latest generation of […]
Jan, 26
Hybrid strategy for stencil computations on the APU
Stencil computations are very regular and well adapted to GPU execution. However, the PCI-E bus that connects a discrete GPU to the system memory has a relatively low bandwidth when compared to the GPU compute power. The AMD APU architecture contains both CPU and GPU on the same chip and shared memory between them, which […]
Jan, 26
Efficient Parallel Implementation of Active Appearance Model Fitting Algorithm on GPU
The Active Appearance Model (AAM) is one of the most powerful model-based object detecting and tracking methods that has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern Graphics Processing Units (GPUs) that feature a […]
Jan, 26
GPU acceleration of Newton’s method for large systems of polynomial equations in double double and quad double arithmetic
In order to compensate for the higher cost of double double and quad double arithmetic when solving large polynomial systems, we investigate the application of NVIDIA Tesla C2050, K20C, and K40 general purpose graphics processing units. As the dimension equals several thousands, the cost to compute one QR decomposition is sufficiently large so that the […]
Jan, 26
GPU Monte Carlo scatter calculations for Cone Beam Computed Tomography
A GPU Monte Carlo code for x-ray photon transport has been implemented and extensively tested. The code is intended for scatter compensation of cone beam computed tomography images. The code was tested to agree with other well known codes within 5% for a set of simple scenarios. The scatter compensation was also tested using an […]
Jan, 25
A High-productivity Framework for Multi-GPU computation of Mesh-based applications
The paper proposes a high-productivity framework for multi-GPU computation of mesh-based applications. In order to achieve high performance on these applications, we have to introduce complicated optimized techniques for GPU computing, which requires relatively-high cost of implementation. Our framework automatically translates user-written functions that update a grid point and generates both GPU and CPU code. […]
Jan, 25
Accelerating a Bayesian Phylogenetic Inference Application with OpenACC
The need for faster computing has been around ever since the birth of the first computers. Faster hardware will almost always guarantee faster computing but occasionally the rate of hardware development is not enough for some programs to deal with the vast information they need. When these programs need to be accelerated, algorithmic optimizations have […]
Jan, 25
Performance Evaluation of the Intel Many Integrated Core Architecture for 3D Image Reconstruction in Computed Tomography
The computational effort of 3D image reconstruction in Computed Tomography (CT) has required special purpose hardware for a long time. Systems such as custom-built FPGA-systems and GPUs are still widely-used today, in particular in interventional settings, where radiologists require a hard time constraint for reconstruction. However, recently is has been shown that today even commodity […]
Jan, 25
Improvement of the fused CUDA kernels performance prediction
In this thesis a tool for improving the performance prediction of a source-to-source compiler of mapped functions developed on the Faculty of Informatics is presented. This tool integrates the modification of the original compiler and static and dynamic data gathering to provide as much data about the fusions as possible in order to analyze them. […]
Jan, 25
Finite differences numerical method for two-dimensional superlattice Boltzmann transport equation and case comparison of CPU(C) and GPGPU(CUDA) implementations
We present finite differences numerical algorithm for solving 2D spatially homogeneous Boltzmann transport equation for semiconductor superlattices (SL) subject to time dependant electric field along SL axis and constant perpendicular magnetic field. Algorithm is implemented in C language targeted to CPU and in CUDA C language targeted to commodity NVidia GPUs. We compare performance and […]