Posts
Oct, 1
Optimizing Real Time GPU Kernels Using Fuzzy Inference System
CPU technology is slowly reaching its threshold, however Moore’s Law still holds true for GPUs. With the increasing scope for GPGPU computing more and more applications are being ported to the GPU framework. One of the most suited application areas for GPGPU computing is image processing and computer vision. The high performance given by GPUs […]
Oct, 1
Template Library for Multi-GPU Pseudorandom Number Recursion-based Generators
The aim of the paper is to show how to design and implement fast parallel algorithms for Linear Congruential, Lagged Fibonacci and Wichmann-Hill pseudorandom number generators. The new algorithms employ the divide-and-conquer approach for solving linear recurrence systems. They are implemented on multi GPU-accelerated systems using CUDA. Numerical experiments performed on a computer system with […]
Sep, 30
Exploring Programming Multi-GPUs using OpenMP & OpenACC-based Hybrid Model
Heterogeneous computing come with tremendous potential and is a leading candidate for scientific applications that are becoming more and more complex. Accelerators such as GPUs whose computing momentum is growing faster than ever offer application performance when compute intensive portions of an application are offloaded to them. It is quite evident that future computing architectures […]
Sep, 30
Compiling a High-level Directive-Based Programming Model for GPGPUs
OpenACC is an emerging directive-based programming model for programming accelerators that typically enable non-expert programmers to achieve portable and productive performance of their applications. In this paper, we present the research and development challenges, and our solutions to create an open-source OpenACC compiler in a main stream compiler framework (OpenUH of a branch of Open64). […]
Sep, 30
Data-parallel Acceleration of PARSEC Black-Scholes Benchmark
The way programmers has been relying on processor improvements to gain speedup in their applications is no longer applicable in the same fashion. Programmers usually have to parallelize their code to utilize the CPU cores in the system to gain a significant speedup. To accelerate parallel applications furthermore there are a couple of techniques available. […]
Sep, 30
Approximate dynamic programming with post-decision states as a solution method for dynamic economic models
I introduce and evaluate a new stochastic simulation method for dynamic economic models. It is based on recent work in the operations research and engineering literatures (Van Roy et. al, 1997; Powell, 2007; Bertsekas, 2011). The baseline method involves rewriting the household’s dynamic program in terms of post-decision states. This makes it possible to choose […]
Sep, 30
A GPU cluster optimized multigrid scheme for computing unsteady incompressible fluid flow
A multigrid scheme has been proposed that allows efficient implementation on modern CPUs, many integrated core devices (MICs), and graphics processing units (GPUs). It is shown that wide single instruction multiple data (SIMD) processing engines are used efficiently when a deep, 2h grid hierarchy is replaced with a two level scheme using 16h-32h restriction. The […]
Sep, 29
Adapting data processing methods to modern GPU architecture
Wavelet transform have a wide area of application in many scientific areas, for example signal processing, image compression [6] or data mining [4] [5]. Present requirements demand preforming large amount of calculations in the minimum time. For that reason the goal of this paper is to present an approach that will fulfill mentioned requirements, by […]
Sep, 29
Separate Compilation in a Language-Integrated Heterogeneous Environment
Heterogeneous computing platforms are becoming more common in recent years. Effective programming languages and tools will play a key role in unlocking the performance potential of these systems. In this paper, we present the design and implementation of separate compilation and linking support for the CUDA programming platform. CUDA provides a language-integrated environment for writing […]
Sep, 29
Evaluation of disconnected quark loops for hadron structure using GPUs
A number of stochastic methods developed for the calculation of fermion loops are investigated and compared, in particular with respect to their efficiency when implemented on Graphics Processing Units (GPUs). We assess the performance of the various methods by studying the convergence and statistical accuracy obtained for observables that require a large number of stochastic […]
Sep, 29
The Complete Rank Transform: A Tool for Accurate and Morphologically Invariant Matching of Structures
Most researchers agree that invariances are desirable in computer vision systems. However, one always has to keep in mind that this is at the expense of accuracy: By construction, all invariances inevitably discard information. The concept of morphological invariance is a good example for this trade-off and will be in the focus of this paper. […]
Sep, 29
Optimizing Urban Environmental Simulations using Boinc
Urban cities are usually densely populated and have massive infrastructure. They consume a lot of energy and generate pollution. Urban form and structure interact with the environment in a complex way. There is transfer of energy between buildings and the ground layer. Winds flow through the urban street canyons, affecting evaporation, temperature and pollution dispersion. […]