Posts
Aug, 15
Optimizing the Computation of Eigenvalues Using Graphics Processing Units
In this paper, we first briefly describe some mathematical aspects regarding the computation of eigenvalues, followed by an original approach: a bisection algorithm useful in computing eigenvalues for a tridiagonal symmetric matrix of arbitrary size, using the computing capabilities of the latest graphics processing units that incorporate the Compute Unified Device Architecture. The novel approach […]
Aug, 14
Performance of FORTRAN and C GPU Extensions for a Benchmark Suite of Fourier Pseudospectral Algorithms
A comparison of PGI OpenACC, FORTRAN CUDA, and Nvidia CUDA pseudospectral methods on a single GPU and GCC FORTRAN on single and multiple CPU cores is reported. The GPU implementations use CuFFT and the CPU implementations use FFTW. Porting pre-existing FORTRAN codes to utilize a GPUs is efficient and easy to implement with OpenACC and […]
Aug, 14
Accelerating cellular automata simulations using AVX and CUDA
We investigated various methods of parallelization of the Frish-Hasslacher-Pomeau (FHP) cellular automata algorithm for modeling fluid flow. These methods include SSE, AVX, and POSIX Threads for central processing units (CPUs) and CUDA for graphics processing units (GPUs). We present implementation details of the FHP algorithm based on AVX/SSE and CUDA technologies. We found that (a) […]
Aug, 14
Dynamic Warp Resizing in High-Performance SIMT
Modern GPUs synchronize threads grouped in a warp at every instruction. These results in improving SIMD efficiency and makes sharing fetch and decode resources possible. The number of threads included in each warp (or warp size) affects divergence, synchronization overhead and the efficiency of memory access coalescing. Small warps reduce the performance penalty associated with […]
Aug, 14
A Second-Order Distributed Trotter-Suzuki Solver with a Hybrid Kernel
The Trotter-Suzuki approximation leads to an efficient algorithm for solving the time-dependent Schroedinger equation. Using existing highly optimized CPU and GPU kernels, we developed a distributed version of the algorithm that runs efficiently on a cluster. Our implementation also improves single node performance, and is able to use multiple GPUs within a node. The scaling […]
Aug, 14
A GPU implementation of the Simulated Annealing Heuristic for the Quadratic Assignment Problem
The quadratic assignment problem (QAP) is one of the most difficult combinatorial optimization problems. An effective heuristic for obtaining approximate solutions to the QAP is simulated annealing (SA). Here we describe an SA implementation for the QAP which runs on a graphics processing unit (GPU). GPUs are composed of low cost commodity graphics chips which […]
Aug, 13
Orthorectification by Using GPGPU Method
Thanks to the nature of the graphics processing, the newly released products offer highly parallel processing units with high-memory bandwidth and computational power of more than teraflops per second. The modern GPUs are not only powerful graphic engines but also they are high level parallel programmable processors with very fast computing capabilities and high-memory bandwidth […]
Aug, 13
Real-Time Exact Graph Matching with Application in Human Action Recognition
Graph matching is one of the principal methods to formulate the correspondence between two set of points in computer vision and pattern recognition. Most formulations are based on the minimization of a difficult energy function which is known to be NP-hard. Traditional methods solve the minimization problem approximately. In this paper, we derive an exact […]
Aug, 13
Spiking Neural Networks for Real-Time Infrared Images Processing in Thermo Vision Systems
Thermo vision are used in military, police custom traffic control, industrial and other specific applications for collecting and processing thermo visual information from infrared images. There is a problem in the steps of implementation of the developed methods and algorithms for infrared image processing in real time practical applications of thermo vision systems. Here is […]
Aug, 13
Dense Matrix Computation on a Heterogenous Architecture: A Block Synchronous Approach
We present a strategy for efficient use of all components of a heterogenous compute node of a typical current generation cluster. Such nodes often comprise multiple sockets with a multicore processor per socket and one or more accelerators, possibly from different generations and/or types. Our strategy differs from schedulers such as Quark or SuperMatrix in […]
Aug, 13
Multi-GPU-based Swendsen-Wang multi-cluster algorithm for the simulation of two-dimensional q-state Potts model
We present the multiple GPU computing with the common unified device architecture (CUDA) for the Swendsen-Wang multi-cluster algorithm of two-dimensional (2D) q-state Potts model. Extending our algorithm for single GPU computing [Comp. Phys. Comm. 183 (2012) 1155], we realize the GPU computation of the Swendsen-Wang multi-cluster algorithm for multiple GPUs. We implement our code on […]
Aug, 11
Real-Time Implementation of Remotely Sensed Hyperspectral Image Unmixing on GPUs
Spectral unmixing is one of the most popular techniques to analyze remotely sensed hyperspectral images. It generally comprises three stages: 1) reduction of the dimensionality of the original image to a proper subspace; 2) automatic identification of pure spectral signatures (called endmembers); and 3) estimation of the fractional abundance of each endmember in each pixel […]