Posts
Jan, 8
Makespan computation for GPU threads running on a single streaming multiprocessor
Graphics processors were originally developed for rendering graphics but have recently evolved towards being an architecture for general-purpose computations. They are also expected to become important parts of embedded systems hardware – not just for graphics. However, this necessitates the development of appropriate timing analysis techniques which would be required because techniques developed for CPU […]
Jan, 8
Hybrid Algorithms for List Ranking and Graph Connected Components
The advent of multicore and many-core architectures saw them being deployed to speed-up computations across several disciplines and application areas. Prominent examples include semi-numerical algorithms such as sorting, graph algorithms, image processing, scientific computations, and the like. In particular, using GPUs for general purpose computations has attracted a lot of attention given that GPUs can […]
Jan, 8
Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems
This paper explores the need for asynchronous iteration algorithms as smoothers in multigrid methods. The hardware target for the new algorithms is top-of-the-line, highly parallel hybrid architectures – multicore-based systems enhanced with GPGPUs. These architectures are the most likely candidates for future highend supercomputers. To pave the road for their efficient use, we must resolve […]
Jan, 8
Parameter Tuning of a Hybrid Treecode-FMM on GPUs
Treecodes are O(N log N) hierarchical N-body algorithms, which have traditionally been used for applications in astrophysics, in a low-accuracy regime. Fast multipole methods (FMM) are O(N) hierarchical N-body algorithms that have been used in a variety of applications, often in the high-accuracy regime. Both algorithms are known to perform well on massively parallel heterogeneous […]
Jan, 8
A Quasi-Parallel GPU-Based Algorithm for Delaunay Edge-Flips
The Delaunay edge-flip algorithm is a practical method for transforming any existing triangular mesh S into a mesh T(S) that satisfies the Delaunay condition. Although several implementations of this algorithm are known, to the best of our knowledge no parallel GPU-based implementation has been reported yet. In the present work, we propose a quadriphasic and […]
Jan, 8
Direct solution of the Boltzmann equation for a binary mixture on GPUs
We show how to accelerate the numerical solution of the Boltzmann equation for a binary gas mixture by using Graphics Processing Units (GPUs). In order to fully exploit the computational power of the GPU, we adopt a semi-regular method of solution which combines a finite difference discretization of the free-streaming term with a Monte Carlo […]
Jan, 8
Massively Parallel Sequential Monte Carlo for Bayesian Inference
This paper reconsiders sequential Monte Carlo approaches to Bayesian inference in the light of massively parallel desktop computing capabilities now well within the reach of individual academics. It first develops an algorithm that is well suited to parallel computing in general and for which convergence results have been established in the sequential Monte Carlo literature […]
Jan, 8
Some Graph Algorithms And Related Primitives For The GPU
General purpose computing on graphics processor units (GPGPU) has attained widespread acceptance in the high-performance computing community. This has largely been at- tributed to the rise of programming models and large peak performance to cost ratio of the GPU. The peak throughput of modern GPUs are typically 5 TFLOPS at a cost of 600 US […]
Jan, 8
Implementation of Kd-Trees on the GPU to Achieve Real Time Graphics Processing
This paper examines the parallelization of ray tracing algorithms with the goal of running the whole process on the graphics processing unit (GPU) rather than the central processing unit (CPU). The motivation behind this endeavour is to utilize the massively parallel nature of the GPU. This parallelism allows the construction of 3-dimensional images to take […]
Jan, 8
A Highly Efficient GPU-CPU Hybrid Parallel Implementation of Sparse LU Factorization
In this paper, we try to accelerate sparse LU factorization on GPU. We present a tiled storage format and a parallel algorithm to improve the memory access pattern, and a register blocking method to compress the on-chip working set. The OPENMP implementation of our algorithm gives more stable performance over different matrices, and outperforms SuperLU […]
Jan, 8
Cryptanalysis of the Full AES Using GPU-Like Special-Purpose Hardware
The block cipher Rijndael has undergone more than ten years of extensive cryptanalysis since its submission as a candidate for the Advanced Encryption Standard (AES) in April 1998. To date, most of the publicly-known cryptanalytic results are based on reduced-round variants of the AES (respectively Rijndael) algorithm. Among the few exceptions that target the full […]
Jan, 7
Report on the Feasibility of Implementing PIC Codes on a GPU
GPUs have become a very attractive supplement to traditional high performance computing. GPUs have significantly better performance per cost and power consumption. However, GPUs introduce several additional levels of parallelism that must be contended with. New methods must be developed in order to take full advantage of the capabilities of this architecture. This paper explores […]