Posts
Nov, 8
An extended GPU radiosity solver
In this paper we present an extended GPU progressive radiosity solver which integrates ideal diffuse as well as specular transmittance and reflection. The solver is capable to handle multiple specular reflections with correct mirror-object-mirror occlusions. The use of graphics hardware allows to consider attenuation of radiation due to reflections and/or transmissions on a per-pixel basis, […]
Nov, 8
A SIMD-efficient 14 instruction shader program for high-throughput microtriangle rasterization
This paper shows that breaking the barrier of 1 triangle/clock rasterization rate for microtriangles in modern GPU architectures in an efficient way is possible. The fixed throughput of the special purpose culling and triangle setup stages of the classic pipeline limits the GPU scalability to rasterize many triangles in parallel when these cover very few […]
Nov, 8
Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters
Nowadays, NVIDIA’s CUDA is a general purpose scalable parallel programming model for writing highly parallel applications. It provides several key abstractions – a hierarchy of thread blocks, shared memory, and barrier synchronization. This model has proven quite successful at programming multithreaded many core GPUs and scales transparently to hundreds of cores: scientists throughout industry and […]
Nov, 8
Solving lattice QCD systems of equations using mixed precision solvers on GPUs
Modern graphics hardware is designed for highly parallel numerical tasks and promises significant cost and performance benefits for many scientific applications. One such application is lattice quantum chromodynamics (lattice QCD), where the main computational challenge is to efficiently solve the discretized Dirac equation in the presence of an SU (3) gauge field. Using NVIDIA’s CUDA […]
Nov, 8
Monte Carlo randomization tests for large-scale abundance datasets on the GPU
Statistical tests are often performed to discover which experimental variables are reacting to specific treatments. Time-series statistical models usually require the researcher to make assumptions with respect to the distribution of measured responses which may not hold. Randomization tests can be applied to data in order to generate null distributions non-parametrically. However, large numbers of […]
Nov, 8
Parallel, distributed and GPU computing technologies in single-particle electron microscopy
Most known methods for the determination of the structure of macromolecular complexes are limited or at least restricted at some point by their computational demands. Recent developments in information technology such as multicore, parallel and GPU processing can be used to overcome these limitations. In particular, graphics processing units (GPUs), which were originally developed for […]
Nov, 8
Accelerating incompressible flow computations with a Pthreads-CUDA implementation on small-footprint multi-GPU platforms
Graphics processor units (GPU) that are originally designed for graphics rendering have emerged as massively-parallel “co-processors” to the central processing unit (CPU). Small-footprint multi-GPU workstations with hundreds of processing elements can accelerate compute-intensive simulation science applications substantially. In this study, we describe the implementation of an incompressible flow Navier-Stokes solver for multi-GPU workstation platforms. A […]
Nov, 8
A GPU-based matting Laplacian solver for high resolution image matting
The recently proposed matting Laplacian (Levin et al., IEEE Trans. Pattern Anal. Mach. Intell. 30(2):228-242, 2008) has been proven to be a state-of-the-art method for solving the image matting problem. Using this method, matting is formulated as solving a high-order linear system which is hard-constrained by the input trimap. The main drawback of this method, […]
Nov, 8
Parallel Iterative Linear Solvers on GPU: A Financial Engineering Case
In many numerical applications resulting from computational science and engineering problems, the solution of sparse linear systems is the most prohibitively compute intensive task. Consequently, the linear solvers need to be carefully chosen and efficiently implemented in order to harness the available computing resources. Krylov subspace based iterative solvers have been widely used for solving […]
Nov, 8
Parallel medical image reconstruction: from graphics processing units (GPU) to Grids
We present and compare a variety of parallelization approaches for a real-world case study on modern parallel and distributed computer architectures. Our case study is a production-quality, time-intensive algorithm for medical image reconstruction used in computer tomography (PET). We parallelize this algorithm for the main kinds of contemporary parallel architectures: shared-memory multiprocessors, distributed-memory clusters, graphics […]
Nov, 8
Power Efficient Large Matrices Multiplication by Load Scheduling on Multi-core and GPU Platform with CUDA
Power efficiency is one of the most important issues in high performance computing (HPC) interrelated to both software and hardware. Power dissipation of a program lies on algorithm design and power features of the computer components on which the program runs. In this work, we measure and model the power consumption of large matrices multiplication […]
Nov, 8
GPU-based Real-Time Soft Tissue Deformation with Cutting and Haptic Feedback
This article describes a series of contributions in the field of real-time simulation of soft tissue biomechanics. These contributions address various requirements for interactive simulation of complex surgical procedures. In particular, this article presents results in the areas of soft tissue deformation, contact modelling, simulation of cutting, and haptic rendering, which are all relevant to […]