Posts
Oct, 4
Architecture-Aware Optimization on a 1600-core Graphics Processor
The graphics processing unit (GPU) continues to make significant strides as an accelerator in commodity cluster computing for high-performance computing (HPC). For example, three of the top five fastest supercomputers in the world, as ranked by the TOP500, employ GPUs as accelerators. Despite this increasing interest in GPUs, however, optimizing the performance of a GPU-accelerated […]
Oct, 4
Fine-grained Parallel ILU Preconditioners with Fill-ins for Multi-core CPUs and GPUs
Numerical simulation and its huge computational demands require a close coupling between efficient mathematical methods and their hardware-aware implementation on emerging and highly parallel computing platforms. The paradigm shift towards manycore parallelism not only offers a high potential of computing capabilities but also comes up with urgent challenges in designing scalable, portable, and flexible software […]
Oct, 4
GPU Algorithms for Diamond-based Multiresolution Terrain Processing
We present parallel algorithms for processing, extracting and rendering adaptively sampled regular terrain datasets represented as a multiresolution model defined by a super-square-based diamond hierarchy. This model represents a terrain as a nested triangle mesh generated through a series of longest edge bisections and encoded in an implicit hierarchical structure, which clusters triangles into diamonds […]
Oct, 4
Finite element assembly strategies on multi-and many-core architectures
We demonstrate that radically differing implementations of finite element methods are needed on multicore (CPU) and many-core (GPU) architectures, if their respective performance potential is to be realised. Our experimental investigations using a finite element advection-diffusion solver show that increased performance on each architecture can only be achieved by committing to specific and diverse algorithmic […]
Oct, 4
Berkeley Dwarfs on CUDA
Graphics processing units (GPUs) greatly improved their performance over the last ten years. The first graphics cards have been developed in the late 90’s and were targeted for the mass market. These first cards were special purpose hardware, designed to accelerate graphic processing required in computer games. As the interest in computer games continued, GPU […]
Oct, 4
Comparing Parallel Simulation of Social Agents using Cilk and OpenCL
Recent advances in wireless/mobile communication and body worn sensors, together with ambient intelligence and seamless integrated pervasive technology have paved the way for applications operating based on social signals, i. e., sensing and processing of group behavior, interpersonal relationships, or emotions. Thinking in large, it should be apparent that modeling social systems allowing to study […]
Oct, 4
Optimization of the Gaussian Mixture Model Evaluation on GPU
In this paper we present a highly optimized implementation of Gaussian mixture acoustic model evaluation algorithm. Evaluation of these likelihoods is one of the most computationally intensive parts of automatics speech recognizers but it can be well-parallelized and offloaded to GPU devices. Our approach offers significant speed-up compared to the recently published approaches, since it […]
Oct, 4
Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL
We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance raytracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We […]
Oct, 3
Tranformation of CPU-based Applications To Leverage on Graphics Processors using CUDA
Scientific computation requires a great amount of computing power especially in floating-point operation but a high-end multi-cores processor is currently limited in terms of floating point operation performance and parallelization. Recent technological advancement has made parallel computing technically and financially feasible using Compute Unified Device Architecture (CUDA) developed by NVIDIA. This research focuses on measuring […]
Oct, 3
Parallel Game Tree Search Using GPU
Parallel performance of graphics cards in desktop computers generally outreaches performance of conventional processors. The purpose of this paper is to identify possibilities of tasks parallelization when searching and evaluating game trees and to propose algorithms that would perform better on SIMD processors of graphics cards than on regular desktop processors. On proposed algorithms’ basis […]
Oct, 3
Implementation of the optimization algorithms on GPGPU architecture and multi-cores
This bibliography study mainly synthesize the key ideas of the parallel architectures, neural network models, and discuss the implementation algorithm design methods that will be used on the GPGPU and multicores to realize the optimizations. Since the neural network computational models are regarded as valuable tools to solve many scientific and practical problems, and it […]
Oct, 3
GPU-Accelerated DNA Distance Matrix Computation
Distance matrix calculation used in phylogeny analysis is computational intensive. The growing sequences data sets necessitate fast computation method. This paper accelerate Felsenstein’s DNADIST program by using OpenCL to exploit the great computation capability of graphic card. The GPUaccelerated DNADIST program achieves more than 12-fold speedup over the serial CPU program on a personal workstation […]