Posts
Sep, 16
Optimizations and Performance of a Robotics Grasping Algorithm Described in Geometric Algebra
The usage of Conformal Geometric Algebra leads to algorithms that can be formulated in a very clear and easy to grasp way. But it can also increase the performance of an implementation because of its capabilities to be computed in parallel. In this paper we show how a grasping algorithm for a robotic arm is […]
Sep, 16
Parallel Medical Image Reconstruction: From Graphics Processors to Grids
We present a variety of possible parallelization approaches for a real-world case study using several modern parallel and distributed computer architectures. Our case study is a production-quality, time-intensive algorithm for medical image reconstruction used in computer tomography. We describe how this algorithm can be parallelized for the main kinds of contemporary parallel architectures: shared-memory multiprocessors, […]
Sep, 16
A Generic Approach to Topic Models
This article contributes a generic model of topic models. To define the problem space, general characteristics for this class of models are derived, which give rise to a representation of topic models as "mixture networks", a domain-specific compact alternative to Bayesian networks. Besides illustrating the interconnection of mixtures in topic models, the benefit of this […]
Sep, 16
Implicit and dynamic trees for high performance rendering
Recent advances in GPU architecture and programmability have enabled the computation of ray casted or ray traced images at interactive frame rates. However, the rapid performance gains of the hardware cannot by themselves address the challenge posed by the steady growth in the geometric and temporal complexity of computer graphics datasets. In this paper we […]
Sep, 16
Fast Monte Carlo Simulation for Patient-specific CT/CBCT Imaging Dose Calculation
Recently, X-ray imaging dose from computed tomography (CT) or cone beam CT (CBCT) scans has become a serious concern. Patient-specific imaging dose calculation has been proposed for the purpose of dose management. While Monte Carlo (MC) dose calculation can be quite accurate for this purpose, it suffers from low computational efficiency. In response to this […]
Sep, 15
Analytical motion blur rasterization with compression
We present a rasterizer, based on time-dependent edge equations, that computes analytical visibility in order to render accurate motion blur. The theory for doing the computations in a rasterization framework is derived in detail, and then implemented. To keep the frame buffer requirements low, we also present a new oracle-based compression algorithm for the time […]
Sep, 15
Processing data streams with hard real-time constraints on heterogeneous systems
Data stream processing applications such as stock exchange data analysis, VoIP streaming, and sensor data processing pose two conflicting challenges: short per-stream latency — to satisfy the milliseconds-long, hard real-time constraints of each stream, and high throughput — to enable efficient processing of as many streams as possible. High-throughput programmable accelerators such as modern GPUs […]
Sep, 15
Strategies for preparing computer science students for the multicore world
Multicore computers have become standard, and the number of cores per computer is rising rapidly. How does the new demand for understanding of parallel computing impact computer science education? In this paper, we examine several aspects of this question: (i) What parallelism body of knowledge do todaya’s students need to learn? (ii) How might these […]
Sep, 15
Performing with CUDA
Recently a GPGPU application had to be redesigned to overcome performance problems. A number of software engineering lessons were learnt from this and other projects. We describe those about obtaining high performance from nVidia GPUs and practical aspects of CUDA C software development.
Sep, 15
Fast Mersenne prime testing on the GPU
The Lucas-Lehmer test for Mersenne primality can be efficiently parallelized for GPU-based computation. The gpuLucas project implements an irrational-base discrete weighted transform approach (IBDWT) using balanced-integers, non-power-of-two transforms, and carry-save radix representations. gpuLucas uses the CUDA programming language and requires the double-precision floating point capabilities of recent GPUs. Results show up to 7x speedups over […]
Sep, 15
Scaling Lattice QCD beyond 100 GPUs
Over the past five years, graphics processing units (GPUs) have had a transformational effect on numerical lattice quantum chromodynamics (LQCD) calculations in nuclear and particle physics. While GPUs have been applied with great success to the post-Monte Carlo "analysis" phase which accounts for a substantial fraction of the workload in a typical LQCD calculation, the […]
Sep, 15
Scalable and deterministic timing-driven parallel placement for FPGAs
This paper describes a parallel implementation of the timing-driven VPR 5.0 simulated annealing engine. By restricting the move distance to a confined neighborhood, it is possible to consider a large number of non-conflicting moves in parallel and achieve a deterministic result. The full timing-driven algorithm is parallelized, including the detailed timing analysis updates done periodically […]