Posts
Apr, 12
NUMA-Aware Image Compositing on Multi-GPU Platform
Sort-last parallel rendering is widely used. Recent GPU developments mean that a PC equipped with multiple GPUs is a viable alternative to a high-cost supercomputer: the Fermi architecture of s single GPU supports uniform virtual addressing, providing a foundation for non-uniform memory access (NUMA) on multi-GPU platforms. Such hardware changes require the user to reconsider […]
Apr, 12
High Performance Computing on GPU for Electromagnetic Logging
The article deals with the development of software and algorithmic techniques for multidimensional modeling and inversion of electromagnetic logs. With many new oil and gas fields being developed in difficult geological conditions, the requirements tend to be higher for reliability and efficiency of log data interpretation. Within this research various programs and algorithms were created […]
Apr, 12
Wire Speed Name Lookup: A GPU-based Approach
This paper studies the name lookup issue with longest prefix matching, which is widely used in URL filtering, content routing/switching, etc. Recently Content-Centric Networking (CCN) has been proposed as a clean slate future Internet architecture to naturally fit the contentcentric property of today’s Internet usage: instead of addressing end hosts, the Internet should operate based […]
Apr, 12
Real-time Subsurface Scattering for Particle-based Fluids using Finite Volume Method
We present a real-time subsurface scattering simulation to perform real-time rendering of translucent particle-based fluids. After particle-based fluid simulation, we immediately build voxelized fluids, calledVoronoi fluids, with particle locations and neighbour lists using GPUs. And then, we perform a multiple subsurface scattering simulation over the Voronoi fluids with the diffusion equation (DE). We employ Finite […]
Apr, 10
Batched Kronecker product for 2-D matrices and 3-D arrays on NVIDIA GPUs
We describe an interface and an implementation for performing Kronecker product actions on NVIDIA GPUs for multiple small 2-D matrices and 3-D arrays processed in parallel as a batch. This method is suited to cases where the Kronecker product component matrices are identical but the operands in a matrix-free application vary in the batch. Any […]
Apr, 10
CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions
BACKGROUND: The maximal sensitivity for local alignments makes the Smith-Waterman algorithm a popular choice for protein sequence database search based on pairwise alignment. However, the algorithm is compute-intensive due to a quadratic time complexity. Corresponding runtimes are further compounded by the rapid growth of sequence databases. RESULTS: We present CUDASW++ 3.0, a fast Smith-Waterman protein […]
Apr, 9
Modeling of High Performance Programs to Support Heterogeneous Computing
In order to harness the power of multicore CPUs and GPUs, HPC (High Performance Computing) programmers and even end-users need new tools and techniques to express their core problem, divide that core problem into sub problems, allocate computational resources for the sub-problems, execute the resources, and collect results. HPC users focus more on the problem […]
Apr, 9
OpenCL Fast Fourier Transform
Fast Fourier Transform is one of the most important numerical algorithms in history. It has wide range of applications: audio signal processing, medical imaging, image processing, pattern recognition, computational chemistry, error correcting codes and spectral methods for PDE’s. The goal of this project is to implement an OpenCL based FFT algorithm that has comparable performance […]
Apr, 9
Accelerating Image Reconstruction in Three-Dimensional Optoacoustic Tomography on Graphics Processing Units
PURPOSE: Optoacoustic tomography (OAT) is inherently a three-dimensional (3D) inverse problem. However, most studies of OAT image reconstruction still employ two-dimensional (2D) imaging models. One important reason is because 3D image reconstruction is computationally burdensome. The aim of this work is to accelerate existing image reconstruction algorithms for 3D OAT by use of parallel programming […]
Apr, 9
A Performance Comparison of Different Graphics Processing Units Running Direct N-Body Simulations
Hybrid computational architectures based on the joint power of Central Processing Units and Graphic Processing Units (GPUs) are becoming popular and powerful hardware tools for a wide range of simulations in biology, chemistry, engineering, physics, etc.. In this paper we present a comparison of performance of various GPUs available on market when applied to the […]
Apr, 9
A GEMM interface and implementation on NVIDIA GPUs for multiple small matrices
We present an interface and an implementation of the General Matrix Multiply (GEMM) routine for multiple small matrices processed simultaneously on NVIDIA graphics processing units (GPUs). We focus on matrix sizes under 16. The implementation can be easily extended to larger sizes. For single precision matrices, our implementation is 30% to 600% faster than the […]
Apr, 8
pVOCL: Power-Aware Dynamic Placement and Migration in Virtualized GPU Environments
Power-hungry Graphics processing unit (GPU) accelerators are ubiquitous in high performance computing data centers today. GPU virtualization frameworks introduce new opportunities for effective management of GPU resources by decoupling them from application execution. However, power management of GPU-enabled server clusters faces significant challenges. The underlying system infrastructure shows complex power consumption characteristics depending on the […]

