6914

Posts

Jan, 6

Applying OOC Techniques in the Reduction to Condensed Form for Very Large Symmetric Eigenproblems on GPUs

In this paper we address the reduction of a dense matrix to tridiagonal form for the solution of symmetric eigenvalue problems on a graphics processor (GPU) when the data is too large to fit into the accelerator memory. We apply out-of-core techniques to a three-stage algorithm, carefully redesigning the first stage to reduce the number […]
Jan, 5

A GPU Implementation of Inclusion-based Points-to Analysis

Graphics Processing Units (GPUs) have emerged as powerful accelerators for many regular algorithms that operate on dense arrays and matrices. In contrast, we know relatively little about using GPUs to accelerate highly irregular algorithms that operate on pointer-based data structures such as graphs. For the most part, research has focused on GPU implementations of graph […]
Jan, 5

PARRAY: A Unifying Array Representation for Heterogeneous Parallelism

This paper introduces a programming interface called PARRAY (or Parallelizing ARRAYs) that supports system-level succinct programming for heterogeneous parallel systems like GPU clusters. The current practice of software development requires combining several low-level libraries like Pthread, OpenMP, CUDA and MPI. Achieving productivity and portability is hard with different numbers and models of GPUs. PARRAY extends […]
Jan, 5

Selecting the Best Tridiagonal System Solver Projected on Multi-Core CPU and GPU Platforms

Nowadays multicore processors and graphics cards are commodity hardware that can be found in personal computers. Both CPU and GPU are capable of performing high-end computations. In this paper we present and compare parallel implementations of two tridiagonal system solvers. We analyze the cyclic reduction method, as an example of fine-grained parallelism, and Bondeli’s algorithm, […]
Jan, 5

Implementation of a Fast Image Coding and Retrieval System Using a GPU

Sparse coding of image patches is a compact but computationally expensive method of representing images. As part of our SenSIP consortium industry projects, we implement the Orthogonal Matching Pursuit algorithm using a single CUDA kernel on a GPU and sparse codes for image patches are obtained in parallel. Image-based "exact search" and "visually similar search" […]
Jan, 5

Fully 3D list-mode time-of-flight PET image reconstruction on GPUs using CUDA

PURPOSE: List-mode processing is an efficient way of dealing with the sparse nature of positron emission tomography (PET) data sets and is the processing method of choice for time-of-flight (ToF) PET image reconstruction. However, the massive amount of computation involved in forward projection and backprojection limits the application of list-mode reconstruction in practice, and makes […]
Jan, 5

BFROST: Binary Features from Robust Orientation Segment Tests accelerated on the GPU

We propose a fast local image feature detector and descriptor that is implementable on the GPU. Our method is the first GPU implementation of the popular FAST detector. A simple but novel method of feature orientation estimation which can be calculated in constant time is proposed. The robustness and reliability of our orientation estimation is […]
Jan, 5

A Parallel Supercomputer Implementation of a Biological Inspired Neural Network and its use for Pattern Recognition

A parallel implementation of a large spiking neural network is proposed and evaluated. The neural network implements the binding by synchrony process using the Oscillatory Dynamic Link Matcher (ODLM). Scalability, speed and performance are compared for 2 implementations: Message Passing Interface (MPI) and Compute Unified Device Architecture (CUDA) running on clusters of multicore supercomputers and […]
Jan, 5

Implementation of Keccak hash function in Tree hashing mode on Nvidia GPU

This paper presents a Graphics Processing Unit implementation of KECCAK cryptographic hash function, in a parallel tree hash mode to exploit the parallel compute capacity of the graphics cards. The Nvidia Cuda language has been used to access precisely the specificity of the GPU hardware (memory hierarchy, host-device memory transfers). After optimizations of the cooperation […]
Jan, 5

Pyramidal Image Blending Using CUDA Framework

We propose and implement a pyramidal image blending algorithm using modern programmable graphic processing units. This algorithm is an essential part of an image stitching process for a seamless panoramic mosaic. The CUDA framework is a novel GPU programming framework from NVIDIA. We realize significant acceleration in computations of the pyramidal image blending algorithm by […]
Jan, 5

Abundance Estimation Algorithms using NVIDIA CUDA Technology

Spectral unmixing of hyperspectral images is a process by which the constituent’s members of a pixel scene are determined and the fraction of the abundance of the elements is estimated. Several algorithms have been developed in the past in order to obtain abundance estimation from hyperspectral data, however, most of them are characterized by being […]
Jan, 4

Efficient mapping of the training of Convolutional Neural Networks to a CUDA-based cluster

We propose a method to parallelize the training of a convolutional neural network by using a CUDA-based cluster. We attain a substantial increase in the performance of the algorithm itself. We research the feasibility of using batch versus online mode training and provide a performance comparison between them. Furthermore, we propose an implementation of an […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: