6897

Posts

Jan, 4

Extending a C-like Language for Portable SIMD Programming

SIMD instructions are common in CPUs for years now. Using these instructions effectively requires not only vectorization of code, but also modifications to the data layout. However, automatic vectorization techniques are often not powerful enough and suffer from restricted scope of applicability; hence, programmers often vectorize their programs manually by using intrinsics: compiler-known functions that […]
Jan, 4

Parallel Implementation of Compressive Sensing Based SAR Imaging with GPU

The paper proposed a new scheme for parallel implementation of compressive sensing based SAR imaging on GPU with Iterative Shrinkage/Thresholding algorithm. To get a faster recovery speed, we modified the existed IST algorithm structure, and realized the fast implementation on GPU. The experiment result shows that parallel computing capabilities of GPU have a significant speedup […]
Jan, 4

Evaluating polynomials in several variables and their derivatives on a GPU computing processor

In order to obtain more accurate solutions of polynomial systems with numerical continuation methods we use multiprecision arithmetic. Our goal is to offset the overhead of double double arithmetic accelerating the path trackers and in particular Newton’s method with a general purpose graphics processing unit. In this paper we describe algorithms for the massively parallel […]
Jan, 4

Thermal and Athermal Swarms of Self-Propelled Particles

Swarms of self-propelled particles exhibit complex behavior that can arise from simple models, with large changes in swarm behavior resulting from small changes in model parameters. We investigate the steady-state swarms formed by self-propelled Morse particles in three dimensions using molecular dynamics simulations optimized for GPUs. We find a variety of swarms of different overall […]
Jan, 4

Decoupled Deferred Shading for Hardware Rasterization

In this paper we present decoupled deferred shading: a rendering technique based on a new data structure called compact geometry buffer, which stores shading samples independently from the visibility. This enables caching and efficient reuse of shading computation, e.g. for stochastic rasterization techniques. In contrast to previous methods, our decoupled shading can be efficiently implemented […]
Jan, 4

Building Source-to-Source Compilers for Heterogeneous Targets

Heterogeneous computers – platforms that make use of multiple specialized devices to achieve high throughput or low energy consumption – are difficult to program. Hardware vendors usually provide compilers from a C dialect to their machines, but complete application rewriting is frequently required to take advantage of them. In this thesis, we propose a new […]
Jan, 4

GPU TV-L1 Optical Flow

Determining optical flow, the pattern of apparent motion of objects caused by the relative motion between observer and objects in the scene, is a fundamental problem in computer vision. Given two images, goal is to compute the 2D motion field – a projection of 3D velocities of surface points onto the imaging surface. Optical flow […]
Jan, 4

Parallel Implementation Algorithm of Motion Estimation for GPU Applications

The video coding standard H.264/AVC can achieve higher coding efficiency than previous standards. However, it comes at the expense of an increased encoding complexity, especially for motion estimation process which induces very time consuming task even for current central processing units (CPU). On the other hand, due to the rapid growth of the processing capability […]
Jan, 4

Efficient and Good Delaunay Meshes From Random Points

We present a Conforming Delaunay Triangulation (CDT) algorithm based on maximal Poisson disk sampling. Points are unbiased, meaning the probability of introducing a vertex in a disk-free subregion is proportional to its area, except in a neighborhood of the domain boundary. In contrast, Delaunay refinement CDT algorithms place points dependent on the geometry of empty […]
Jan, 3

GPGPU Accelerated Texture-Based Radiosity

Radiosity is a popular global illumination algorithm capable of achieving photorealistic rendering results. However, its use in interactive environments is limited by its computational complexity. This paper presents a GPGPU-based implementation of the gathering radiosity approach using texture-based discretisation and the OpenCL framework. Hemicubes are rendered to a texture array and processed by OpenCL kernels […]
Jan, 3

OpenCL Sparse Linear Solver for Circuit Simulation

Sparse linear systems are found in many common scientific and engineering problems. In VLSI CAD tools, performing DC circuit analysis can create large, sparse systems represented by huge matrices. Solving such systems can take orders of magnitude of time to compute. Many attempts have been made to parallelize algorithms to solve these matrices. Graphics cards, […]
Jan, 3

Architecture-Aware Mapping and Optimization on a 1600-Core GPU

The graphics processing unit (GPU) continues to make in-roads as a computational accelerator for highperformance computing (HPC). However, despite its increasing popularity, mapping and optimizing GPU code remains a difficult task; it is a multi-dimensional problem that requires deep technical knowledge of GPU architecture. Although substantial literature exists on how to map and optimize GPU […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: