## Posts

Nov, 8

### Particle-Based Fluid Simulation on the GPU

Large scale particle-based fluid simulation is important to both the scientific and computer graphics communities. In this paper, we explore the effectiveness of implementing smoothed particle hydrodynamics on the streaming architecture of a GPU. A dynamic quadtree structure is proposed to accelerate the computation of inter-particle forces. Our method readily extends to higher dimensions without […]

Nov, 8

### Multifold Acceleration of Neural Network Computations Using GPU

With emergence of graphics processing units (GPU) of the latest generation, it became possible to undertake neural network based computations using GPU on serially produced video display adapters. In this study, NVIDIA CUDA technology has been used to implement standard back-propagation algorithm for training multiple perceptrons simultaneously on GPU. For the problem considered, GPU-based implementation […]

Nov, 8

### An extended GPU radiosity solver

In this paper we present an extended GPU progressive radiosity solver which integrates ideal diffuse as well as specular transmittance and reflection. The solver is capable to handle multiple specular reflections with correct mirror-object-mirror occlusions. The use of graphics hardware allows to consider attenuation of radiation due to reflections and/or transmissions on a per-pixel basis, […]

Nov, 8

### A SIMD-efficient 14 instruction shader program for high-throughput microtriangle rasterization

This paper shows that breaking the barrier of 1 triangle/clock rasterization rate for microtriangles in modern GPU architectures in an efficient way is possible. The fixed throughput of the special purpose culling and triangle setup stages of the classic pipeline limits the GPU scalability to rasterize many triangles in parallel when these cover very few […]

Nov, 8

### Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters

Nowadays, NVIDIA’s CUDA is a general purpose scalable parallel programming model for writing highly parallel applications. It provides several key abstractions – a hierarchy of thread blocks, shared memory, and barrier synchronization. This model has proven quite successful at programming multithreaded many core GPUs and scales transparently to hundreds of cores: scientists throughout industry and […]

Nov, 8

### Solving lattice QCD systems of equations using mixed precision solvers on GPUs

Modern graphics hardware is designed for highly parallel numerical tasks and promises significant cost and performance benefits for many scientific applications. One such application is lattice quantum chromodynamics (lattice QCD), where the main computational challenge is to efficiently solve the discretized Dirac equation in the presence of an SU (3) gauge field. Using NVIDIA’s CUDA […]

Nov, 8

### Monte Carlo randomization tests for large-scale abundance datasets on the GPU

Statistical tests are often performed to discover which experimental variables are reacting to specific treatments. Time-series statistical models usually require the researcher to make assumptions with respect to the distribution of measured responses which may not hold. Randomization tests can be applied to data in order to generate null distributions non-parametrically. However, large numbers of […]

Nov, 8

### Parallel, distributed and GPU computing technologies in single-particle electron microscopy

Most known methods for the determination of the structure of macromolecular complexes are limited or at least restricted at some point by their computational demands. Recent developments in information technology such as multicore, parallel and GPU processing can be used to overcome these limitations. In particular, graphics processing units (GPUs), which were originally developed for […]

Nov, 8

### Accelerating incompressible flow computations with a Pthreads-CUDA implementation on small-footprint multi-GPU platforms

Graphics processor units (GPU) that are originally designed for graphics rendering have emerged as massively-parallel “co-processors” to the central processing unit (CPU). Small-footprint multi-GPU workstations with hundreds of processing elements can accelerate compute-intensive simulation science applications substantially. In this study, we describe the implementation of an incompressible flow Navier-Stokes solver for multi-GPU workstation platforms. A […]

Nov, 8

### A GPU-based matting Laplacian solver for high resolution image matting

The recently proposed matting Laplacian (Levin et al., IEEE Trans. Pattern Anal. Mach. Intell. 30(2):228-242, 2008) has been proven to be a state-of-the-art method for solving the image matting problem. Using this method, matting is formulated as solving a high-order linear system which is hard-constrained by the input trimap. The main drawback of this method, […]

Nov, 8

### Parallel Iterative Linear Solvers on GPU: A Financial Engineering Case

In many numerical applications resulting from computational science and engineering problems, the solution of sparse linear systems is the most prohibitively compute intensive task. Consequently, the linear solvers need to be carefully chosen and efficiently implemented in order to harness the available computing resources. Krylov subspace based iterative solvers have been widely used for solving […]

Nov, 8

### Parallel medical image reconstruction: from graphics processing units (GPU) to Grids

We present and compare a variety of parallelization approaches for a real-world case study on modern parallel and distributed computer architectures. Our case study is a production-quality, time-intensive algorithm for medical image reconstruction used in computer tomography (PET). We parallelize this algorithm for the main kinds of contemporary parallel architectures: shared-memory multiprocessors, distributed-memory clusters, graphics […]