Posts
Oct, 4
Optimization of the Gaussian Mixture Model Evaluation on GPU
In this paper we present a highly optimized implementation of Gaussian mixture acoustic model evaluation algorithm. Evaluation of these likelihoods is one of the most computationally intensive parts of automatics speech recognizers but it can be well-parallelized and offloaded to GPU devices. Our approach offers significant speed-up compared to the recently published approaches, since it […]
Oct, 4
Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL
We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance raytracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We […]
Oct, 3
Tranformation of CPU-based Applications To Leverage on Graphics Processors using CUDA
Scientific computation requires a great amount of computing power especially in floating-point operation but a high-end multi-cores processor is currently limited in terms of floating point operation performance and parallelization. Recent technological advancement has made parallel computing technically and financially feasible using Compute Unified Device Architecture (CUDA) developed by NVIDIA. This research focuses on measuring […]
Oct, 3
Parallel Game Tree Search Using GPU
Parallel performance of graphics cards in desktop computers generally outreaches performance of conventional processors. The purpose of this paper is to identify possibilities of tasks parallelization when searching and evaluating game trees and to propose algorithms that would perform better on SIMD processors of graphics cards than on regular desktop processors. On proposed algorithms’ basis […]
Oct, 3
Implementation of the optimization algorithms on GPGPU architecture and multi-cores
This bibliography study mainly synthesize the key ideas of the parallel architectures, neural network models, and discuss the implementation algorithm design methods that will be used on the GPGPU and multicores to realize the optimizations. Since the neural network computational models are regarded as valuable tools to solve many scientific and practical problems, and it […]
Oct, 3
GPU-Accelerated DNA Distance Matrix Computation
Distance matrix calculation used in phylogeny analysis is computational intensive. The growing sequences data sets necessitate fast computation method. This paper accelerate Felsenstein’s DNADIST program by using OpenCL to exploit the great computation capability of graphic card. The GPUaccelerated DNADIST program achieves more than 12-fold speedup over the serial CPU program on a personal workstation […]
Oct, 3
Parallel SAT-Solving with OpenCL
In the last few decades there have been substantial improvements in approaches for solving the Boolean satisfiability problem. Many of these improvements consisted in elaborating on existing algorithms. On the side of the complete solvers this led to more efficient branching heuristics and the use of watched literals for unit propagation; incomplete solvers on the […]
Oct, 3
Heterogeneous Computing with OpenCL
Heterogeneous Computing with OpenCL teaches OpenCL and parallel programming for complex systems that may include a variety of device architectures: multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units (APUs) such as AMD Fusion technology. Designed to work on multiple platforms and with wide industry support, OpenCL will help you more effectively program for a heterogeneous […]
Oct, 3
An OpenCL Fast Fourier Transformation
This paper describes an implementation strategy in preparation for an implementation of an OpenCL FFT. The two most essential factors (memory bandwidth and locality) that are crucial to obtain high performance on a GPU for an FFT implementation are highlighted. Theoretical upper bounds for performance in terms of the locality factor are derived. An implementation […]
Oct, 3
Realtime Computation of a VST Audio Effect Plugin on the Graphics Processor
A plugin system for GPGPU real time audio effect calculation on the graphics processing unit of the computer system is presented. The prototype application is the rendering of mono audio material with head-related transfer functions (HRTFs) to create the impression of a sound source located in a certain direction relative to the listener’s head. The […]
Oct, 3
Towards robust automatic detection of vulnerable road users: monocular pedestrian tracking from a moving vehicle
In this paper we present steps towards the automatic detection of vulnerable road users in video. Such a system can e.g. be used as an automatic blind spot camera for trucks. The aim of the system is to automatically warn the driver when the algorithm detects vulnerable road users in the camera images. Such an […]
Oct, 3
An Auto-tuning Solution to Data Streams Clustering in OpenCL
Due to its applicability to numerous types of data, including telephone records, web documents, and click streams, the data stream model has recently attracted attention. For analysis of such data, it is crucial to process the data in a single pass, or a small number of passes, using little memory. This paper provides an OpenCL […]

