12681

Posts

Aug, 11

A Memory-Efficient Algorithm for Large-Scale Symmetric Tridiagonal Eigenvalue Problem on Multi-GPU Systems

Divide-and-conquer algorithm is a numerically stable and efficient algorithm that computes the eigenvalues and eigenvectors of a symmetric tridiagonal matrix. We often face the situation where the input matrix fits into the main memory but not into the on-chip memory of a GPU device. We present an out-of-core implementation where only part of the input […]
Aug, 11

UAV Path Planning with Parallel Genetic Algorithms on CUDA Architecture

In recent years, Unmanned Aerial Vehicles (UAVs) are emerged as an attractive technology for different types of military and civil applications which have gained importance in academic researches. In these emerging research areas, UAV autonomy gets a great part and mainly it refers the ability for automatic take-off, landing and path planning of UAVs. In […]
Aug, 11

Parallel Breadth First Search on GPU Clusters

Fast, scalable, low-cost, and low-power execution of parallel graph algorithms is important for a wide variety of commercial and public sector applications. Breadth First Search (BFS) imposes an extreme burden on memory bandwidth and network communications and has been proposed as a benchmark that may be used to evaluate current and future parallel computers. Hardware […]
Aug, 10

Optimising Purely Functional GPU Programs (Thesis)

It is well acknowledged that the dominant mechanism for scaling processor performance has become to increase the number of cores on a chip, rather than improve the performance of a single core. However, harnessing these extra cores to improve single application performance remains an extremely challenging task. A recent trend has been to use commodity […]
Aug, 10

Direct Numerical Simulation of Turbulent Flows with Parallel Algorithms for Various Computing Architectures

The purpose of the work is twofold. Firstly, it is devoted to the development of efficient parallel algorithms for large-scale simulations of turbulent flows on different supercomputer architectures. It reports experience with massively-parallel accelerators including graphics processing units of AMD and NVIDIA and Intel Xeon Phi coprocessors. Secondly, it introduces new series of direct numerical […]
Aug, 10

GPGPU Based Aeroacoustic Optimization of a Contra-Rotating Fan

Contra-rotating fans have several advantages over single stage axial fans. If they are well designed, the exit flow field is almost irrotational. This helps to increase the aerodynamic efficiency by up to 16%, when compared to single stage fans. However, since the second stage interacts with the flow disturbances from the first stage, the associated […]
Aug, 10

An Improved Image Segmentation Algorithm Based on GPU Parallel Computing

In the process of image segmentation, the classic Fuzzy C-Means (FCM) algorithm is time-consuming and depends heavily on initialization center. Based on Graphic Processing Unit (GPU), this paper proposes a novel FCM algorithm by improving the computational formulas of membership degree and the update criterion of cluster centers. Our algorithm can initialize cluster centers purposefully […]
Aug, 10

Faster sequence alignment through GPU-accelerated restriction of the seed-and-extend search space

MOTIVATION: In computing pairwise alignments of biological sequences, software implementations employ a variety of heuristics that decrease the computational effort involved in computing potential alignments. A key element in achieving high processing throughput is to identify and prioritize potential alignments where high-scoring mappings can be expected. These tasks involve list-processing operations that can be efficiently […]
Aug, 9

Real-Time Automatic Object Classification and Tracking using Genetic Programming and NVIDIA CUDA

Genetic Programming (GP) is a widely used methodology for solving various computational problems. GP’s problem solving ability is usually hindered by its long execution times. In this thesis, GP is applied toward real-time computer vision. In particular, object classification and tracking using a parallel GP system is discussed. First, a study of suitable GP languages […]
Aug, 9

Vivaldi: A Domain-Specific Language for Volume Processing and Visualization on Distributed Heterogeneous Systems

As the size of image data from microscopes and telescopes increases, the need for high-throughput processing and visualization of large volumetric data has become more pressing. At the same time, many-core processors and GPU accelerators are commonplace, making high-performance distributed heterogeneous computing systems affordable. However, effectively utilizing GPU clusters is difficult for novice programmers, and […]
Aug, 9

Fast Semantic Segmentation of RGB-D Scenes with GPU-Accelerated Deep Neural Networks

In semantic scene segmentation, every pixel of an image is assigned a category label. This task can be made easier by incorporating depth information, which structured light sensors provide. Depth, however, has very different properties from RGB image channels. In this paper, we present a novel method to provide depth information to convolutional neural networks. […]
Aug, 9

Parallel Distributed Breadth First Search on the Kepler Architecture

We present the results obtained by using an evolution of our CUDA-based solution for the exploration, via a Breadth First Search, of large graphs. This latest version exploits at its best the features of the Kepler architecture and relies on a 2D decomposition of the adjacency matrix to reduce the number of communications among the […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: