5671

Posts

Sep, 16

Implicit and dynamic trees for high performance rendering

Recent advances in GPU architecture and programmability have enabled the computation of ray casted or ray traced images at interactive frame rates. However, the rapid performance gains of the hardware cannot by themselves address the challenge posed by the steady growth in the geometric and temporal complexity of computer graphics datasets. In this paper we […]
Sep, 16

Fast Monte Carlo Simulation for Patient-specific CT/CBCT Imaging Dose Calculation

Recently, X-ray imaging dose from computed tomography (CT) or cone beam CT (CBCT) scans has become a serious concern. Patient-specific imaging dose calculation has been proposed for the purpose of dose management. While Monte Carlo (MC) dose calculation can be quite accurate for this purpose, it suffers from low computational efficiency. In response to this […]
Sep, 15

Analytical motion blur rasterization with compression

We present a rasterizer, based on time-dependent edge equations, that computes analytical visibility in order to render accurate motion blur. The theory for doing the computations in a rasterization framework is derived in detail, and then implemented. To keep the frame buffer requirements low, we also present a new oracle-based compression algorithm for the time […]
Sep, 15

Processing data streams with hard real-time constraints on heterogeneous systems

Data stream processing applications such as stock exchange data analysis, VoIP streaming, and sensor data processing pose two conflicting challenges: short per-stream latency — to satisfy the milliseconds-long, hard real-time constraints of each stream, and high throughput — to enable efficient processing of as many streams as possible. High-throughput programmable accelerators such as modern GPUs […]
Sep, 15

Strategies for preparing computer science students for the multicore world

Multicore computers have become standard, and the number of cores per computer is rising rapidly. How does the new demand for understanding of parallel computing impact computer science education? In this paper, we examine several aspects of this question: (i) What parallelism body of knowledge do todaya’s students need to learn? (ii) How might these […]
Sep, 15

Performing with CUDA

Recently a GPGPU application had to be redesigned to overcome performance problems. A number of software engineering lessons were learnt from this and other projects. We describe those about obtaining high performance from nVidia GPUs and practical aspects of CUDA C software development.
Sep, 15

Fast Mersenne prime testing on the GPU

The Lucas-Lehmer test for Mersenne primality can be efficiently parallelized for GPU-based computation. The gpuLucas project implements an irrational-base discrete weighted transform approach (IBDWT) using balanced-integers, non-power-of-two transforms, and carry-save radix representations. gpuLucas uses the CUDA programming language and requires the double-precision floating point capabilities of recent GPUs. Results show up to 7x speedups over […]
Sep, 15

Scaling Lattice QCD beyond 100 GPUs

Over the past five years, graphics processing units (GPUs) have had a transformational effect on numerical lattice quantum chromodynamics (LQCD) calculations in nuclear and particle physics. While GPUs have been applied with great success to the post-Monte Carlo "analysis" phase which accounts for a substantial fraction of the workload in a typical LQCD calculation, the […]
Sep, 15

Scalable and deterministic timing-driven parallel placement for FPGAs

This paper describes a parallel implementation of the timing-driven VPR 5.0 simulated annealing engine. By restricting the move distance to a confined neighborhood, it is possible to consider a large number of non-conflicting moves in parallel and achieve a deterministic result. The full timing-driven algorithm is parallelized, including the detailed timing analysis updates done periodically […]
Sep, 15

A platform-independent tool for modeling parallel programs

Programming languages that can utilize the underlying parallel architecture in shared memory, distributed memory or Graphics Processing Units (GPUs) are used extensively for solving scientific problems. However, from our observation of studying multiple parallel programs from various domains, such programming languages have a substantial amount of sequential code mixed with the parallel code. When rewriting […]
Sep, 15

Ambient occlusion volumes

This paper introduces a new approximation algorithm for the near-field ambient occlusion problem. It combines known pieces in a new way to achieve substantially improved quality over fast methods and substantially improved performance compared to accurate methods. Intuitively, it computes the analog of a shadow volume for ambient light around each polygon, and then applies […]
Sep, 15

Visual simulation of shockwaves

We present an efficient method for visual simulations of shock phenomena in compressible, inviscid fluids. Our algorithm is derived from one class of the finite volume method especially designed for capturing shock propagation, but offers improved efficiency through physically-based simplification and adaptation for graphical rendering. Our technique is capable of handling complex, bidirectional object-shock interactions […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: