Posts
Feb, 10
Real-Time SAH BVH Construction for Ray Tracing Dynamic Scenes
This work is aimed at the development of effective algorithms for building of full SAH BVH trees on GPU in real-time. In this work it is presupposed that all the scene objects are represented by a number of triangles (the so-called "triangle soup"), at the same time the arbitrary changes in the geometry are allowed […]
Feb, 9
Accelerating H.264 Advanced Video Coding with GPU/CUDA Technology
With the rise of streaming media on the Internet and the YouTube revolution, the demand for online videos is costing companies a significant amount of bandwidth. To alleviate the resources needed for streaming media, video compression removes redundant information and minimizes the loss in quality experienced by a human audience. In response to the need […]
Feb, 9
Parallel Semi-Implicit Time Integrators
In this paper, we further develop a family of parallel time integrators known as Revisionist Integral Deferred Correction methods (RIDC) to allow for the semi-implicit solution of time dependent PDEs. Additionally, we show that our semi-implicit RIDC algorithm can harness the computational potential of multiple general purpose graphical processing units (GPGPUs) by utilizing existing CUBLAS […]
Feb, 9
The Boat Hull Model: Adapting the Roofline Model to Enable Performance Prediction for Parallel Computing
Multi-core and many-core were already major trends for the past six years, and are expected to continue for the next decades. With these trends of parallel computing, it becomes increasingly difficult to decide on which architecture to run a given application. In this work, we use an algorithm classification to predict performance prior to algorithm […]
Feb, 9
CudaRF: A CUDA-based Implementation of Random Forests
Machine learning algorithms are frequently applied in data mining applications. Many of the tasks in this domain concern high-dimensional data. Consequently, these tasks are often complex and computationally expensive. This paper presents a GPU-based parallel implementation of the Random Forests algorithm. In contrast to previous work, the proposed algorithm is based on the compute unified […]
Feb, 9
Real-time simulation of a spiking neural network model of the basal ganglia circuitry using general purpose computing on graphics processing units
Real-time simulation of a biologically realistic spiking neural network is necessary for evaluation of its capacity to interact with real environments. However, the real-time simulation of such a neural network is difficult due to its high computational costs that arise from two factors: (1) vast network size and (2) the complicated dynamics of biologically realistic […]
Feb, 8
Auto-Generation and Auto-Tuning of 3D Stencil Codes on GPU Clusters
This paper develops and evaluates search and optimization techniques for auto-tuning 3D stencil (nearest-neighbor) computations on GPUs. Observations indicate that parameter tuning is necessary for heterogeneous GPUs to achieve optimal performance with respect to a search space. Our proposed framework takes a most concise specification of stencil behavior from the user as a single formula, […]
Feb, 8
The PEPPHER Approach to Programmability and Performance Portability for Heterogeneous many-core Architectures
The European FP7 project PEPPHER is addressing programmability and performance portability for current and emerging heterogeneous many-core archi- tectures. As its main idea, the project proposes a multi-level parallel execution model comprised of potentially parallelized components existing in variants suitable for different types of cores, memory configurations, input characteristics, optimization criteria, and couples this with […]
Feb, 8
Acceleration of a Locally Tuned Sine Non Linear Video Enhancement Algorithm on GPGPU
Computer Vision based applications support various domains such as medical, manufacturing, military intelligence and surveillance systems. These applications can be divided into: image acquisition, pre-processing, feature extraction, detection or segmentation, and high-level processing. However these tasks are time intensive due to the compute bound nature of the algorithm. In this thesis, an algorithm, based on […]
Feb, 8
Symbolic Testing of OpenCL Code
We present an effective technique for crosschecking a C or C++ program against an accelerated OpenCL version, as well as a technique for detecting data races in OpenCL programs. Our techniques are implemented in KLEE-CL, a symbolic execution engine based on KLEE and KLEE-FP that supports symbolic reasoning on the equivalence between symbolic values. Our […]
Feb, 8
Verifiable Computation with Massively Parallel Interactive Proofs
As the cloud computing paradigm has gained prominence, the need for verifiable computation has grown increasingly urgent. The concept of verifiable computation enables a weak client to outsource difficult computations to a powerful, but untrusted, server. Protocols for verifiable computation aim to provide the client with a guarantee that the server performed the requested computations […]
Feb, 7
GMP implementation on CUDA – A Backward Compatible Design With Performance Tuning
The goal of this project is to implement the GMP library in CUDA and evaluate its performance. GMP (GNU Multiple Precision) is a free library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating point numbers. There is no practical limit to the precision except the ones implied by the available memory […]