Posts
Jan, 9
Fast GPU-based Locality Sensitive Hashing for K-Nearest Neighbor Computation
We present an efficient GPU-based parallel LSH algorithm to perform approximate k-nearest neighbor computation in high-dimensional spaces. We use the Bi-level LSH algorithm, which can compute k-nearest neighbors with higher accuracy and is amenable to parallelization. During the first level, we use the parallel RP-tree algorithm to partition datasets into several groups so that items […]
Jan, 8
A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines
We study several solvers for the solution of general linear systems where the main objective is to reduce the communication overhead due to pivoting. We first describe two existing algorithms for the LU factorization on hybrid CPU/GPU architectures. The first one is based on partial pivoting and the second uses a random preconditioning of the […]
Jan, 8
Makespan computation for GPU threads running on a single streaming multiprocessor
Graphics processors were originally developed for rendering graphics but have recently evolved towards being an architecture for general-purpose computations. They are also expected to become important parts of embedded systems hardware – not just for graphics. However, this necessitates the development of appropriate timing analysis techniques which would be required because techniques developed for CPU […]
Jan, 8
Hybrid Algorithms for List Ranking and Graph Connected Components
The advent of multicore and many-core architectures saw them being deployed to speed-up computations across several disciplines and application areas. Prominent examples include semi-numerical algorithms such as sorting, graph algorithms, image processing, scientific computations, and the like. In particular, using GPUs for general purpose computations has attracted a lot of attention given that GPUs can […]
Jan, 8
Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems
This paper explores the need for asynchronous iteration algorithms as smoothers in multigrid methods. The hardware target for the new algorithms is top-of-the-line, highly parallel hybrid architectures – multicore-based systems enhanced with GPGPUs. These architectures are the most likely candidates for future highend supercomputers. To pave the road for their efficient use, we must resolve […]
Jan, 8
Parameter Tuning of a Hybrid Treecode-FMM on GPUs
Treecodes are O(N log N) hierarchical N-body algorithms, which have traditionally been used for applications in astrophysics, in a low-accuracy regime. Fast multipole methods (FMM) are O(N) hierarchical N-body algorithms that have been used in a variety of applications, often in the high-accuracy regime. Both algorithms are known to perform well on massively parallel heterogeneous […]
Jan, 8
A Quasi-Parallel GPU-Based Algorithm for Delaunay Edge-Flips
The Delaunay edge-flip algorithm is a practical method for transforming any existing triangular mesh S into a mesh T(S) that satisfies the Delaunay condition. Although several implementations of this algorithm are known, to the best of our knowledge no parallel GPU-based implementation has been reported yet. In the present work, we propose a quadriphasic and […]
Jan, 8
Direct solution of the Boltzmann equation for a binary mixture on GPUs
We show how to accelerate the numerical solution of the Boltzmann equation for a binary gas mixture by using Graphics Processing Units (GPUs). In order to fully exploit the computational power of the GPU, we adopt a semi-regular method of solution which combines a finite difference discretization of the free-streaming term with a Monte Carlo […]
Jan, 8
Massively Parallel Sequential Monte Carlo for Bayesian Inference
This paper reconsiders sequential Monte Carlo approaches to Bayesian inference in the light of massively parallel desktop computing capabilities now well within the reach of individual academics. It first develops an algorithm that is well suited to parallel computing in general and for which convergence results have been established in the sequential Monte Carlo literature […]
Jan, 8
Some Graph Algorithms And Related Primitives For The GPU
General purpose computing on graphics processor units (GPGPU) has attained widespread acceptance in the high-performance computing community. This has largely been at- tributed to the rise of programming models and large peak performance to cost ratio of the GPU. The peak throughput of modern GPUs are typically 5 TFLOPS at a cost of 600 US […]
Jan, 8
Implementation of Kd-Trees on the GPU to Achieve Real Time Graphics Processing
This paper examines the parallelization of ray tracing algorithms with the goal of running the whole process on the graphics processing unit (GPU) rather than the central processing unit (CPU). The motivation behind this endeavour is to utilize the massively parallel nature of the GPU. This parallelism allows the construction of 3-dimensional images to take […]
Jan, 8
A Highly Efficient GPU-CPU Hybrid Parallel Implementation of Sparse LU Factorization
In this paper, we try to accelerate sparse LU factorization on GPU. We present a tiled storage format and a parallel algorithm to improve the memory access pattern, and a register blocking method to compress the on-chip working set. The OPENMP implementation of our algorithm gives more stable performance over different matrices, and outperforms SuperLU […]

