Posts
Jan, 4
Decoupled Deferred Shading for Hardware Rasterization
In this paper we present decoupled deferred shading: a rendering technique based on a new data structure called compact geometry buffer, which stores shading samples independently from the visibility. This enables caching and efficient reuse of shading computation, e.g. for stochastic rasterization techniques. In contrast to previous methods, our decoupled shading can be efficiently implemented […]
Jan, 4
Building Source-to-Source Compilers for Heterogeneous Targets
Heterogeneous computers – platforms that make use of multiple specialized devices to achieve high throughput or low energy consumption – are difficult to program. Hardware vendors usually provide compilers from a C dialect to their machines, but complete application rewriting is frequently required to take advantage of them. In this thesis, we propose a new […]
Jan, 4
GPU TV-L1 Optical Flow
Determining optical flow, the pattern of apparent motion of objects caused by the relative motion between observer and objects in the scene, is a fundamental problem in computer vision. Given two images, goal is to compute the 2D motion field – a projection of 3D velocities of surface points onto the imaging surface. Optical flow […]
Jan, 4
Parallel Implementation Algorithm of Motion Estimation for GPU Applications
The video coding standard H.264/AVC can achieve higher coding efficiency than previous standards. However, it comes at the expense of an increased encoding complexity, especially for motion estimation process which induces very time consuming task even for current central processing units (CPU). On the other hand, due to the rapid growth of the processing capability […]
Jan, 4
Efficient and Good Delaunay Meshes From Random Points
We present a Conforming Delaunay Triangulation (CDT) algorithm based on maximal Poisson disk sampling. Points are unbiased, meaning the probability of introducing a vertex in a disk-free subregion is proportional to its area, except in a neighborhood of the domain boundary. In contrast, Delaunay refinement CDT algorithms place points dependent on the geometry of empty […]
Jan, 3
GPGPU Accelerated Texture-Based Radiosity
Radiosity is a popular global illumination algorithm capable of achieving photorealistic rendering results. However, its use in interactive environments is limited by its computational complexity. This paper presents a GPGPU-based implementation of the gathering radiosity approach using texture-based discretisation and the OpenCL framework. Hemicubes are rendered to a texture array and processed by OpenCL kernels […]
Jan, 3
OpenCL Sparse Linear Solver for Circuit Simulation
Sparse linear systems are found in many common scientific and engineering problems. In VLSI CAD tools, performing DC circuit analysis can create large, sparse systems represented by huge matrices. Solving such systems can take orders of magnitude of time to compute. Many attempts have been made to parallelize algorithms to solve these matrices. Graphics cards, […]
Jan, 3
Architecture-Aware Mapping and Optimization on a 1600-Core GPU
The graphics processing unit (GPU) continues to make in-roads as a computational accelerator for highperformance computing (HPC). However, despite its increasing popularity, mapping and optimizing GPU code remains a difficult task; it is a multi-dimensional problem that requires deep technical knowledge of GPU architecture. Although substantial literature exists on how to map and optimize GPU […]
Jan, 3
A Scalable Framework for Monte Carlo Simulation Using FPGA-based Hardware Accelerators with Application to SPECT Imaging
As the number of transistors that are integrated onto a silicon die continues to increase, the compute power is becoming a commodity. This has enabled a whole host of new applications that rely on high-throughput computations. Recently, the need for faster and cost-effective applications in form-factor constrained environments has driven an interest in on-chip acceleration […]
Jan, 3
Research and Application of Parallel Computing Technologies based on CUDA and OpenCL
The increased computational performance in science and engineering has led to the strong need for arithmetic intensive parallel computing, CUDA (Compute Unified Device Architecture) and OpenCL (Open Computing Language) are parallel computing technologies proposed in recent years, which both have a great expanse of application prospect in the field of high performance computing. In this […]
Jan, 3
GPU-based Streaming for Parallel Level of Detail on Massive Model Rendering
Rendering massive 3D models in real-time has long been recognized as a very challenging problem because of the limited computational power and memory space available in a workstation. Most existing rendering techniques, especially level of detail (LOD) processing, have suffered from their sequential execution natures, and does not scale well with the size of the […]
Jan, 3
An optimal k-exclusion real-time locking protocol motivated by multi-GPU systems
Graphics processing units (GPUs) are becoming increasingly important in today’s platforms as their increased generality allows for them to be used as powerful co-processors. In previous work, we have found that GPUs may be integrated into real-time systems through the treatment of GPUs as shared resources, allocated to real-time tasks through mutual exclusion locking protocols. […]