6948

Posts

Jan, 9

Designing Numerical Solvers for Next Generation High Performance Computing

High Performance Computing (HPC) is moving towards massive scales of parallelism. The changes in hardware towards large scale on chip parallelism requires the re-writing of existing solvers for various Computational Fluid Dynamics (CFD) problems. The aim of the project is to write and optimise novel solvers for various common CFD numerical problems that can take […]
Jan, 9

LU Factorization for Accelerator-based Systems

Multicore architectures enhanced with multiple GPUs are likely to become mainstream High Performance Computing (HPC) platforms in a near future. In this paper, we present the design and implementation of an LU factorization using tile algorithm that can fully exploit the potential of such platforms in spite of their complexity. We use a methodology derived […]
Jan, 9

Neural Network Simulation: The recognition application

This paper presents the GPU mapping of the recognition algorithm of a Convolution Neural Network (CNN). This work is based on a C-implementation of the application. The mapping to GPU was performed through different approaches which are explained in detail. The improvements achieved by each approach are presented as well as the overall speed up […]
Jan, 9

Spatial Sorting Algorithms for Parallel Computing in Networks

Many basic techniques in computer science have been founded on the assumption that physical computing resources are scarce but orderly, and that the cost of effective direct communication between physically distant parts of a computer system is affordable. In large scale cluster computing installations, fine-grained parallel computing hardware, or wireless mesh networks, these familiar assumptions […]
Jan, 9

High Performance and Scalable GPU Graph Traversal

Breadth-first search (BFS) is a core primitive for graph traversal and a basis for many higher-level graph analysis algorithms. It is also representative of a class of parallel computations whose memory accesses and work distribution are both irregular and data-dependent. Recent work has demonstrated the plausibility of GPU sparse graph traversal, but has tended to […]
Jan, 9

Fast GPU-based Locality Sensitive Hashing for K-Nearest Neighbor Computation

We present an efficient GPU-based parallel LSH algorithm to perform approximate k-nearest neighbor computation in high-dimensional spaces. We use the Bi-level LSH algorithm, which can compute k-nearest neighbors with higher accuracy and is amenable to parallelization. During the first level, we use the parallel RP-tree algorithm to partition datasets into several groups so that items […]
Jan, 8

A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines

We study several solvers for the solution of general linear systems where the main objective is to reduce the communication overhead due to pivoting. We first describe two existing algorithms for the LU factorization on hybrid CPU/GPU architectures. The first one is based on partial pivoting and the second uses a random preconditioning of the […]
Jan, 8

Makespan computation for GPU threads running on a single streaming multiprocessor

Graphics processors were originally developed for rendering graphics but have recently evolved towards being an architecture for general-purpose computations. They are also expected to become important parts of embedded systems hardware – not just for graphics. However, this necessitates the development of appropriate timing analysis techniques which would be required because techniques developed for CPU […]
Jan, 8

Hybrid Algorithms for List Ranking and Graph Connected Components

The advent of multicore and many-core architectures saw them being deployed to speed-up computations across several disciplines and application areas. Prominent examples include semi-numerical algorithms such as sorting, graph algorithms, image processing, scientific computations, and the like. In particular, using GPUs for general purpose computations has attracted a lot of attention given that GPUs can […]
Jan, 8

Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems

This paper explores the need for asynchronous iteration algorithms as smoothers in multigrid methods. The hardware target for the new algorithms is top-of-the-line, highly parallel hybrid architectures – multicore-based systems enhanced with GPGPUs. These architectures are the most likely candidates for future highend supercomputers. To pave the road for their efficient use, we must resolve […]
Jan, 8

Parameter Tuning of a Hybrid Treecode-FMM on GPUs

Treecodes are O(N log N) hierarchical N-body algorithms, which have traditionally been used for applications in astrophysics, in a low-accuracy regime. Fast multipole methods (FMM) are O(N) hierarchical N-body algorithms that have been used in a variety of applications, often in the high-accuracy regime. Both algorithms are known to perform well on massively parallel heterogeneous […]
Jan, 8

A Quasi-Parallel GPU-Based Algorithm for Delaunay Edge-Flips

The Delaunay edge-flip algorithm is a practical method for transforming any existing triangular mesh S into a mesh T(S) that satisfies the Delaunay condition. Although several implementations of this algorithm are known, to the best of our knowledge no parallel GPU-based implementation has been reported yet. In the present work, we propose a quadriphasic and […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: