6905

Posts

Jan, 5

Pyramidal Image Blending Using CUDA Framework

We propose and implement a pyramidal image blending algorithm using modern programmable graphic processing units. This algorithm is an essential part of an image stitching process for a seamless panoramic mosaic. The CUDA framework is a novel GPU programming framework from NVIDIA. We realize significant acceleration in computations of the pyramidal image blending algorithm by […]
Jan, 5

Abundance Estimation Algorithms using NVIDIA CUDA Technology

Spectral unmixing of hyperspectral images is a process by which the constituent’s members of a pixel scene are determined and the fraction of the abundance of the elements is estimated. Several algorithms have been developed in the past in order to obtain abundance estimation from hyperspectral data, however, most of them are characterized by being […]
Jan, 4

Efficient mapping of the training of Convolutional Neural Networks to a CUDA-based cluster

We propose a method to parallelize the training of a convolutional neural network by using a CUDA-based cluster. We attain a substantial increase in the performance of the algorithm itself. We research the feasibility of using batch versus online mode training and provide a performance comparison between them. Furthermore, we propose an implementation of an […]
Jan, 4

Implementing Parallel SMO to Train SVM on CUDA-Enabled Systems

We implement a Sequential Minimal Optimization type algorithm to solve for the Lagrangian weights of the dual form of the Support Vector Machine problem. Unlike the original SMO algorithm, the modified SMO algorithm uses a first-order variable selection heuristic to avoid explicit computation of the KKT conditions. Parallelism in the algorithm is exposed via a […]
Jan, 4

Task and Data Distribution in Hybrid Parallel Systems

This paper describes my work with the Operating Systems and Middleware group for the HPI Research School on "Service-Oriented Systems Engineering". Computer architecture is shifting. The upper levels of the software stack are thus to be adapted in order to benefit from the current and future hardware capabilities. In this paper, we present the Hybrid.Parallel […]
Jan, 4

Toward Real-Time Dense 3d Reconstruction using Stereo Vision

State of the art Structure from Motion algorithms can produce a real-time sparse 3d map of the environment, in a fast, robust and efficient way. However, dense 3d maps would be very useful for accurate Augmented Reality with occlusion management. This project focus on generating accurate dense depth-maps in near real-time from the data provided […]
Jan, 4

Automatic SIMD Code Generation

SIMD instructions are common in microprocessors for roughly one and a half decade now. These instructions enable the programmer to simultaneously perform an operation on several values with a single instruction-hence the name: Single Instruction, Multiple Data. The more values can be computed simultaneously the better the speedup. However, SIMD programming is still commonly considered […]
Jan, 4

Analysis of Real-Time Stereo Vision Algorithms On GPU

Dozens of stereo correspondence algorithms whose matching performance has been measured are available, but the trade-off between speed and matching performance of viable realtime stereo has received much less attention. Here, we evaluate five correspondence algorithms(Symmetric Dynamic Programming Stereo, SemiGlobal Matching, simple block matching, Belief Propagation, and its constant space variant) on a GPU using […]
Jan, 4

Extending a C-like Language for Portable SIMD Programming

SIMD instructions are common in CPUs for years now. Using these instructions effectively requires not only vectorization of code, but also modifications to the data layout. However, automatic vectorization techniques are often not powerful enough and suffer from restricted scope of applicability; hence, programmers often vectorize their programs manually by using intrinsics: compiler-known functions that […]
Jan, 4

Parallel Implementation of Compressive Sensing Based SAR Imaging with GPU

The paper proposed a new scheme for parallel implementation of compressive sensing based SAR imaging on GPU with Iterative Shrinkage/Thresholding algorithm. To get a faster recovery speed, we modified the existed IST algorithm structure, and realized the fast implementation on GPU. The experiment result shows that parallel computing capabilities of GPU have a significant speedup […]
Jan, 4

Evaluating polynomials in several variables and their derivatives on a GPU computing processor

In order to obtain more accurate solutions of polynomial systems with numerical continuation methods we use multiprecision arithmetic. Our goal is to offset the overhead of double double arithmetic accelerating the path trackers and in particular Newton’s method with a general purpose graphics processing unit. In this paper we describe algorithms for the massively parallel […]
Jan, 4

Thermal and Athermal Swarms of Self-Propelled Particles

Swarms of self-propelled particles exhibit complex behavior that can arise from simple models, with large changes in swarm behavior resulting from small changes in model parameters. We investigate the steady-state swarms formed by self-propelled Morse particles in three dimensions using molecular dynamics simulations optimized for GPUs. We find a variety of swarms of different overall […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: