Posts
Sep, 19
Generalisation in genetic programming
Genetic programming can evolve large general solutions using a tiny fraction of possible fitness test sets. Just one test may be enough.
Sep, 19
A training roadmap for new HPC users
Many new users of TeraGrid or other HPC resources are scientists or other domain experts by training and are not necessarily familiar with core principles, practices, and resources within the HPC community. As a result, they often make inefficient use of their own time and effort and of the computing resources as well. In this […]
Sep, 19
A model-driven partitioning and auto-tuning integrated framework for sparse matrix-vector multiplication on GPUs
Sparse Matrix-Vector Multiplication (SpMV) is very common to scientific computing. The Graphics Processing Unit (GPU) has recently emerged as a high-performance computing platform due to its massive processing capability. This paper presents an innovative performance-model driven approach for partitioning sparse matrix into appropriate formats, and auto-tuning configurations of CUDA kernels to improve the performance of […]
Sep, 19
Computing without processors
Heterogeneous systems allow us to target our programming to the appropriate environment. From the programmer’s perspective the distinction between hardware and software is being blurred. As programmers struggle to meet the performance requirements of today’s systems, they will face an ever increasing need to exploit alternative computing elements such as GPUs (graphics processing units), which […]
Sep, 19
Real-time ray casting of algebraic B-spline surfaces
Piecewise algebraic B-spline surfaces (ABS surfaces) are capable of modeling globally smooth shapes of arbitrary topology. These can be potentially applied in geometric modeling, scientific visualization, computer animation and mathematical illustration. However, real-time ray casting the surface is still an obstacle for interactive applications, due to the large amount of numerical root findings of nonlinear […]
Sep, 19
The TheLMA project: Multi-GPU implementation of the lattice Boltzmann method
In this paper, we describe the implementation of a multi-graphical processing unit (GPU) fluid flow solver based on the lattice Boltzmann method (LBM). The LBM is a novel approach in computational fluid dynamics, with numerous interesting features from a computational, numerical, and physical standpoint. Our program is based on CUDA and uses POSIX threads to […]
Sep, 19
Improving SIMD efficiency for parallel Monte Carlo light transport on the GPU
Monte Carlo Light Transport algorithms such as Path Tracing (PT), Bi-Directional Path Tracing (BDPT) and Metropolis Light Transport (MLT) make use of random walks to sample light transport paths. When parallelizing these algorithms on the GPU the stochastic termination of random walks results in an uneven workload between samples, which reduces SIMD efficiency. In this […]
Sep, 19
Randomized selection on the GPU
We implement here a fast and memory-sparing probabilistic top k selection algorithm on the GPU. The algorithm proceeds via an iterative probabilistic guess-and-check process on pivots for a three-way partition. When the guess is correct, the problem is reduced to selection on a much smaller set. This probabilistic algorithm always gives a correct result and […]
Sep, 19
Hybrid smoothed particle hydrodynamics
We present a new algorithm for enforcing incompressibility for Smoothed Particle Hydrodynamics (SPH) by preserving uniform density across the domain. We propose a hybrid method that uses a Poisson solve on a coarse grid to enforce a divergence free velocity field, followed by a local density correction of the particles. This avoids typical grid artifacts […]
Sep, 19
Simpler and faster HLBVH with work queues
A recently developed algorithm called Hierachical Linear Bounding Volume Hierarchies (HLBVH) has demonstrated the feasibility of reconstructing the spatial index needed for ray tracing in real-time, even in the presence of millions of fully dynamic triangles. In this work we present a simpler and faster variant of HLBVH, where all the complex book-keeping of prefix […]
Sep, 19
A GPU-tailored approach for training kernelized SVMs
We present a method for efficiently training binary and multiclass kernelized SVMs on a Graphics Processing Unit (GPU). Our methods apply to a broad range of kernels, including the popular Gaus- sian kernel, on datasets as large as the amount of available memory on the graphics card. Our approach is distinguished from earlier work in […]
Sep, 19
Stream computing on graphics hardware
The raw compute performance of today’s graphics processor is truly amazing. With peak performance of over 60 GFLOPS, the compute power of the graphics processor (GPU) dwarfs that of today’s commodity CPU at a price of only a few hundred dollars. As the programmability and performance of modern graphics hardware continues to increase, many researchers […]