Posts
Jan, 5
A Parallel Supercomputer Implementation of a Biological Inspired Neural Network and its use for Pattern Recognition
A parallel implementation of a large spiking neural network is proposed and evaluated. The neural network implements the binding by synchrony process using the Oscillatory Dynamic Link Matcher (ODLM). Scalability, speed and performance are compared for 2 implementations: Message Passing Interface (MPI) and Compute Unified Device Architecture (CUDA) running on clusters of multicore supercomputers and […]
Jan, 5
Implementation of Keccak hash function in Tree hashing mode on Nvidia GPU
This paper presents a Graphics Processing Unit implementation of KECCAK cryptographic hash function, in a parallel tree hash mode to exploit the parallel compute capacity of the graphics cards. The Nvidia Cuda language has been used to access precisely the specificity of the GPU hardware (memory hierarchy, host-device memory transfers). After optimizations of the cooperation […]
Jan, 5
Pyramidal Image Blending Using CUDA Framework
We propose and implement a pyramidal image blending algorithm using modern programmable graphic processing units. This algorithm is an essential part of an image stitching process for a seamless panoramic mosaic. The CUDA framework is a novel GPU programming framework from NVIDIA. We realize significant acceleration in computations of the pyramidal image blending algorithm by […]
Jan, 5
Abundance Estimation Algorithms using NVIDIA CUDA Technology
Spectral unmixing of hyperspectral images is a process by which the constituent’s members of a pixel scene are determined and the fraction of the abundance of the elements is estimated. Several algorithms have been developed in the past in order to obtain abundance estimation from hyperspectral data, however, most of them are characterized by being […]
Jan, 4
Efficient mapping of the training of Convolutional Neural Networks to a CUDA-based cluster
We propose a method to parallelize the training of a convolutional neural network by using a CUDA-based cluster. We attain a substantial increase in the performance of the algorithm itself. We research the feasibility of using batch versus online mode training and provide a performance comparison between them. Furthermore, we propose an implementation of an […]
Jan, 4
Implementing Parallel SMO to Train SVM on CUDA-Enabled Systems
We implement a Sequential Minimal Optimization type algorithm to solve for the Lagrangian weights of the dual form of the Support Vector Machine problem. Unlike the original SMO algorithm, the modified SMO algorithm uses a first-order variable selection heuristic to avoid explicit computation of the KKT conditions. Parallelism in the algorithm is exposed via a […]
Jan, 4
Task and Data Distribution in Hybrid Parallel Systems
This paper describes my work with the Operating Systems and Middleware group for the HPI Research School on "Service-Oriented Systems Engineering". Computer architecture is shifting. The upper levels of the software stack are thus to be adapted in order to benefit from the current and future hardware capabilities. In this paper, we present the Hybrid.Parallel […]
Jan, 4
Toward Real-Time Dense 3d Reconstruction using Stereo Vision
State of the art Structure from Motion algorithms can produce a real-time sparse 3d map of the environment, in a fast, robust and efficient way. However, dense 3d maps would be very useful for accurate Augmented Reality with occlusion management. This project focus on generating accurate dense depth-maps in near real-time from the data provided […]
Jan, 4
Automatic SIMD Code Generation
SIMD instructions are common in microprocessors for roughly one and a half decade now. These instructions enable the programmer to simultaneously perform an operation on several values with a single instruction-hence the name: Single Instruction, Multiple Data. The more values can be computed simultaneously the better the speedup. However, SIMD programming is still commonly considered […]
Jan, 4
Analysis of Real-Time Stereo Vision Algorithms On GPU
Dozens of stereo correspondence algorithms whose matching performance has been measured are available, but the trade-off between speed and matching performance of viable realtime stereo has received much less attention. Here, we evaluate five correspondence algorithms(Symmetric Dynamic Programming Stereo, SemiGlobal Matching, simple block matching, Belief Propagation, and its constant space variant) on a GPU using […]
Jan, 4
Extending a C-like Language for Portable SIMD Programming
SIMD instructions are common in CPUs for years now. Using these instructions effectively requires not only vectorization of code, but also modifications to the data layout. However, automatic vectorization techniques are often not powerful enough and suffer from restricted scope of applicability; hence, programmers often vectorize their programs manually by using intrinsics: compiler-known functions that […]
Jan, 4
Parallel Implementation of Compressive Sensing Based SAR Imaging with GPU
The paper proposed a new scheme for parallel implementation of compressive sensing based SAR imaging on GPU with Iterative Shrinkage/Thresholding algorithm. To get a faster recovery speed, we modified the existed IST algorithm structure, and realized the fast implementation on GPU. The experiment result shows that parallel computing capabilities of GPU have a significant speedup […]

