Posts
Jan, 5
PARRAY: A Unifying Array Representation for Heterogeneous Parallelism
This paper introduces a programming interface called PARRAY (or Parallelizing ARRAYs) that supports system-level succinct programming for heterogeneous parallel systems like GPU clusters. The current practice of software development requires combining several low-level libraries like Pthread, OpenMP, CUDA and MPI. Achieving productivity and portability is hard with different numbers and models of GPUs. PARRAY extends […]
Jan, 5
Selecting the Best Tridiagonal System Solver Projected on Multi-Core CPU and GPU Platforms
Nowadays multicore processors and graphics cards are commodity hardware that can be found in personal computers. Both CPU and GPU are capable of performing high-end computations. In this paper we present and compare parallel implementations of two tridiagonal system solvers. We analyze the cyclic reduction method, as an example of fine-grained parallelism, and Bondeli’s algorithm, […]
Jan, 5
Implementation of a Fast Image Coding and Retrieval System Using a GPU
Sparse coding of image patches is a compact but computationally expensive method of representing images. As part of our SenSIP consortium industry projects, we implement the Orthogonal Matching Pursuit algorithm using a single CUDA kernel on a GPU and sparse codes for image patches are obtained in parallel. Image-based "exact search" and "visually similar search" […]
Jan, 5
Fully 3D list-mode time-of-flight PET image reconstruction on GPUs using CUDA
PURPOSE: List-mode processing is an efficient way of dealing with the sparse nature of positron emission tomography (PET) data sets and is the processing method of choice for time-of-flight (ToF) PET image reconstruction. However, the massive amount of computation involved in forward projection and backprojection limits the application of list-mode reconstruction in practice, and makes […]
Jan, 5
BFROST: Binary Features from Robust Orientation Segment Tests accelerated on the GPU
We propose a fast local image feature detector and descriptor that is implementable on the GPU. Our method is the first GPU implementation of the popular FAST detector. A simple but novel method of feature orientation estimation which can be calculated in constant time is proposed. The robustness and reliability of our orientation estimation is […]
Jan, 5
A Parallel Supercomputer Implementation of a Biological Inspired Neural Network and its use for Pattern Recognition
A parallel implementation of a large spiking neural network is proposed and evaluated. The neural network implements the binding by synchrony process using the Oscillatory Dynamic Link Matcher (ODLM). Scalability, speed and performance are compared for 2 implementations: Message Passing Interface (MPI) and Compute Unified Device Architecture (CUDA) running on clusters of multicore supercomputers and […]
Jan, 5
Implementation of Keccak hash function in Tree hashing mode on Nvidia GPU
This paper presents a Graphics Processing Unit implementation of KECCAK cryptographic hash function, in a parallel tree hash mode to exploit the parallel compute capacity of the graphics cards. The Nvidia Cuda language has been used to access precisely the specificity of the GPU hardware (memory hierarchy, host-device memory transfers). After optimizations of the cooperation […]
Jan, 5
Pyramidal Image Blending Using CUDA Framework
We propose and implement a pyramidal image blending algorithm using modern programmable graphic processing units. This algorithm is an essential part of an image stitching process for a seamless panoramic mosaic. The CUDA framework is a novel GPU programming framework from NVIDIA. We realize significant acceleration in computations of the pyramidal image blending algorithm by […]
Jan, 5
Abundance Estimation Algorithms using NVIDIA CUDA Technology
Spectral unmixing of hyperspectral images is a process by which the constituent’s members of a pixel scene are determined and the fraction of the abundance of the elements is estimated. Several algorithms have been developed in the past in order to obtain abundance estimation from hyperspectral data, however, most of them are characterized by being […]
Jan, 4
Efficient mapping of the training of Convolutional Neural Networks to a CUDA-based cluster
We propose a method to parallelize the training of a convolutional neural network by using a CUDA-based cluster. We attain a substantial increase in the performance of the algorithm itself. We research the feasibility of using batch versus online mode training and provide a performance comparison between them. Furthermore, we propose an implementation of an […]
Jan, 4
Implementing Parallel SMO to Train SVM on CUDA-Enabled Systems
We implement a Sequential Minimal Optimization type algorithm to solve for the Lagrangian weights of the dual form of the Support Vector Machine problem. Unlike the original SMO algorithm, the modified SMO algorithm uses a first-order variable selection heuristic to avoid explicit computation of the KKT conditions. Parallelism in the algorithm is exposed via a […]
Jan, 4
Task and Data Distribution in Hybrid Parallel Systems
This paper describes my work with the Operating Systems and Middleware group for the HPI Research School on "Service-Oriented Systems Engineering". Computer architecture is shifting. The upper levels of the software stack are thus to be adapted in order to benefit from the current and future hardware capabilities. In this paper, we present the Hybrid.Parallel […]