6901

Posts

Jan, 4

Task and Data Distribution in Hybrid Parallel Systems

This paper describes my work with the Operating Systems and Middleware group for the HPI Research School on "Service-Oriented Systems Engineering". Computer architecture is shifting. The upper levels of the software stack are thus to be adapted in order to benefit from the current and future hardware capabilities. In this paper, we present the Hybrid.Parallel […]
Jan, 4

Toward Real-Time Dense 3d Reconstruction using Stereo Vision

State of the art Structure from Motion algorithms can produce a real-time sparse 3d map of the environment, in a fast, robust and efficient way. However, dense 3d maps would be very useful for accurate Augmented Reality with occlusion management. This project focus on generating accurate dense depth-maps in near real-time from the data provided […]
Jan, 4

Automatic SIMD Code Generation

SIMD instructions are common in microprocessors for roughly one and a half decade now. These instructions enable the programmer to simultaneously perform an operation on several values with a single instruction-hence the name: Single Instruction, Multiple Data. The more values can be computed simultaneously the better the speedup. However, SIMD programming is still commonly considered […]
Jan, 4

Analysis of Real-Time Stereo Vision Algorithms On GPU

Dozens of stereo correspondence algorithms whose matching performance has been measured are available, but the trade-off between speed and matching performance of viable realtime stereo has received much less attention. Here, we evaluate five correspondence algorithms(Symmetric Dynamic Programming Stereo, SemiGlobal Matching, simple block matching, Belief Propagation, and its constant space variant) on a GPU using […]
Jan, 4

Extending a C-like Language for Portable SIMD Programming

SIMD instructions are common in CPUs for years now. Using these instructions effectively requires not only vectorization of code, but also modifications to the data layout. However, automatic vectorization techniques are often not powerful enough and suffer from restricted scope of applicability; hence, programmers often vectorize their programs manually by using intrinsics: compiler-known functions that […]
Jan, 4

Parallel Implementation of Compressive Sensing Based SAR Imaging with GPU

The paper proposed a new scheme for parallel implementation of compressive sensing based SAR imaging on GPU with Iterative Shrinkage/Thresholding algorithm. To get a faster recovery speed, we modified the existed IST algorithm structure, and realized the fast implementation on GPU. The experiment result shows that parallel computing capabilities of GPU have a significant speedup […]
Jan, 4

Evaluating polynomials in several variables and their derivatives on a GPU computing processor

In order to obtain more accurate solutions of polynomial systems with numerical continuation methods we use multiprecision arithmetic. Our goal is to offset the overhead of double double arithmetic accelerating the path trackers and in particular Newton’s method with a general purpose graphics processing unit. In this paper we describe algorithms for the massively parallel […]
Jan, 4

Thermal and Athermal Swarms of Self-Propelled Particles

Swarms of self-propelled particles exhibit complex behavior that can arise from simple models, with large changes in swarm behavior resulting from small changes in model parameters. We investigate the steady-state swarms formed by self-propelled Morse particles in three dimensions using molecular dynamics simulations optimized for GPUs. We find a variety of swarms of different overall […]
Jan, 4

Decoupled Deferred Shading for Hardware Rasterization

In this paper we present decoupled deferred shading: a rendering technique based on a new data structure called compact geometry buffer, which stores shading samples independently from the visibility. This enables caching and efficient reuse of shading computation, e.g. for stochastic rasterization techniques. In contrast to previous methods, our decoupled shading can be efficiently implemented […]
Jan, 4

Building Source-to-Source Compilers for Heterogeneous Targets

Heterogeneous computers – platforms that make use of multiple specialized devices to achieve high throughput or low energy consumption – are difficult to program. Hardware vendors usually provide compilers from a C dialect to their machines, but complete application rewriting is frequently required to take advantage of them. In this thesis, we propose a new […]
Jan, 4

GPU TV-L1 Optical Flow

Determining optical flow, the pattern of apparent motion of objects caused by the relative motion between observer and objects in the scene, is a fundamental problem in computer vision. Given two images, goal is to compute the 2D motion field – a projection of 3D velocities of surface points onto the imaging surface. Optical flow […]
Jan, 4

Parallel Implementation Algorithm of Motion Estimation for GPU Applications

The video coding standard H.264/AVC can achieve higher coding efficiency than previous standards. However, it comes at the expense of an increased encoding complexity, especially for motion estimation process which induces very time consuming task even for current central processing units (CPU). On the other hand, due to the rapid growth of the processing capability […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: