Posts
Jul, 4
On Static Timing Analysis of GPU Kernels
We study static timing analysis of programs running on GPU accelerators. Such programs follow a data parallel programming model that allows massive parallelism on manycore processors. Data parallel programming and GPUs as accelerators have received wide use during the recent years. The timing analysis of programs running on single core machines is well known and […]
Jul, 4
The Design and Implementation of a GPU-enabled Multi-objective Tabu-search Intended for Real World and High-dimensional Applications
Metaheuristics is a class of approximate methods based on heuristics that can effectively handle real world (usually NP-hard) problems of high-dimensionality with multiple objectives. An existing multi-objective Tabu-Search (MOTS2) has been re-designed by and ported onto Compute Unified Device Architecture (CUDA) so as to effectively deal with a scalable multi-objective problem with a range of […]
Jul, 4
Parallel Implementation of Travelling Salesman Problem using Ant Colony Optimization
In this paper we have proposed parallel implementation of Ant colony optimization Ant System algorithm on GPU using OpenCL. We have done comparison on different parameters of the ACO which directly or indirectly affect the result. Parallel comparison of speedup between CPU and GPU implementation is done with a speed up of 3.11x in CPU […]
Jul, 4
SIMD Implementation of a Multiplicative Schwarz Smoother for a Multigrid Poisson Solver on an Intel Xeon Phi Coprocessor
In this paper, we discuss an efficient implementation of the three-dimensional multigrid Poisson solver on a many-core coprocessor, Intel Xeon Phi. We have used the modified block red-black (mBRB) Gauss-Seidel (GS) smoother to achieve sufficient degree of parallelism and high cache hit ratio. We have vectorized (SIMDized) the GS steps in the smoother by introducing […]
Jul, 4
GPUvm: Why Not Virtualizing GPUs at the Hypervisor?
Graphics processing units (GPUs) provide orders-of-magnitude speedup for compute-intensive data-parallel applications. However, enterprise and cloud computing domains, where resource isolation of multiple clients is required, have poor access to GPU technology. This is due to lack of operating system (OS) support for virtualizing GPUs in a reliable manner. To make GPUs more mature system citizens, […]
Jul, 4
A Road Marking Extraction Method Using GPGPU
In driving assistance system (DAS), road marking’s data can provide important assistance for driving safety. As the input image usually includes unnecessary information, lane detection system usually needs to remove most unnecessary data except for the lane markings. In this paper, a road marking extraction method is proposed to separate the painted lane lines using […]
Jul, 3
Exploiting parallel features of modern computer architectures in bioinformatics: applications to genetics, structure comparison and large graph analysis
The exponential growth in bioinformatics data generation and the stagnation of processor frequencies in modern processors stress the need for efficient implementations that fully exploit the parallel capabilities offered by modern computers. This thesis focuses on parallel algorithms and implementations for bioinformatics problems. Various types of parallelism are described and exploited. This thesis presents applications […]
Jul, 3
The Framework and Compilation Techniques for Directive-based GPU Cluster Programming
GPU cluster is an important architecture being used for large scientific and engineering applications. However, manually developed GPU cluster application is still a very difficult task. To alleviate this problem, we adopt the OpenACC standard for directive-based approach and proposed some extension to support GPU cluster programming. The extensions are constructs and clauses used to […]
Jul, 3
Historic Learning Approach for Auto-tuning OpenACC Accelerated Scientific Applications
The performance optimization of scientific applications usually requires an in-depth knowledge of the hardware and software. A performance tuning mechanism is suggested to automatically tune OpenACC parameters to adapt to the execution environment on a given system. A historic learning based methodology is suggested to prune the parameter search space for a more efficient auto-tuning […]
Jul, 3
Reducing the Code Degree Of Parallelism to Increase GPUs Reliability
A higher Degree of Parallelism decreases the code execution time. However, to manage the increased number of parallel processes a higher scheduling strain is required and caches, registers, and other resources utilization will be affected. All these parallelism management variations may have the countermeasure of increasing the GPU neutron sensitivity. The results of an extensive […]
Jul, 3
Toward Auto-tuned Krylov Basis Computations with minimized Communication on Clusters of Accelerators
Krylov Subspace Methods (KSMs) are widely used for solving large scale linear systems and eigenproblems. However, the computing of Krylov subspace basis for KSMs suffers from its intensive blocking scalar product computation and communication, especially in large clusters with accelerators like GPUs. In this paper, a Hyper Graph based communication optimization is applied to Arnoldi […]
Jul, 1
Mixed-precision Orthogonalization Scheme and Adaptive Step Size for CA-GMRES on GPUs
We propose a mixed-precision orthogonalization scheme that takes the input matrix in a standard 32 or 64-bit floating-point precision, but accumulates its intermediate results in the doubled-precision. For a 64-bit input matrix, we use software emulation for the higher-precision arithmetics. Compared with the standard orthogonalization scheme, we require about 8:5 more computation but a much […]