9297

Posts

Apr, 12

Wire Speed Name Lookup: A GPU-based Approach

This paper studies the name lookup issue with longest prefix matching, which is widely used in URL filtering, content routing/switching, etc. Recently Content-Centric Networking (CCN) has been proposed as a clean slate future Internet architecture to naturally fit the contentcentric property of today’s Internet usage: instead of addressing end hosts, the Internet should operate based […]
Apr, 12

Real-time Subsurface Scattering for Particle-based Fluids using Finite Volume Method

We present a real-time subsurface scattering simulation to perform real-time rendering of translucent particle-based fluids. After particle-based fluid simulation, we immediately build voxelized fluids, calledVoronoi fluids, with particle locations and neighbour lists using GPUs. And then, we perform a multiple subsurface scattering simulation over the Voronoi fluids with the diffusion equation (DE). We employ Finite […]
Apr, 10

Batched Kronecker product for 2-D matrices and 3-D arrays on NVIDIA GPUs

We describe an interface and an implementation for performing Kronecker product actions on NVIDIA GPUs for multiple small 2-D matrices and 3-D arrays processed in parallel as a batch. This method is suited to cases where the Kronecker product component matrices are identical but the operands in a matrix-free application vary in the batch. Any […]
Apr, 10

CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions

BACKGROUND: The maximal sensitivity for local alignments makes the Smith-Waterman algorithm a popular choice for protein sequence database search based on pairwise alignment. However, the algorithm is compute-intensive due to a quadratic time complexity. Corresponding runtimes are further compounded by the rapid growth of sequence databases. RESULTS: We present CUDASW++ 3.0, a fast Smith-Waterman protein […]
Apr, 9

Modeling of High Performance Programs to Support Heterogeneous Computing

In order to harness the power of multicore CPUs and GPUs, HPC (High Performance Computing) programmers and even end-users need new tools and techniques to express their core problem, divide that core problem into sub problems, allocate computational resources for the sub-problems, execute the resources, and collect results. HPC users focus more on the problem […]
Apr, 9

OpenCL Fast Fourier Transform

Fast Fourier Transform is one of the most important numerical algorithms in history. It has wide range of applications: audio signal processing, medical imaging, image processing, pattern recognition, computational chemistry, error correcting codes and spectral methods for PDE’s. The goal of this project is to implement an OpenCL based FFT algorithm that has comparable performance […]
Apr, 9

Accelerating Image Reconstruction in Three-Dimensional Optoacoustic Tomography on Graphics Processing Units

PURPOSE: Optoacoustic tomography (OAT) is inherently a three-dimensional (3D) inverse problem. However, most studies of OAT image reconstruction still employ two-dimensional (2D) imaging models. One important reason is because 3D image reconstruction is computationally burdensome. The aim of this work is to accelerate existing image reconstruction algorithms for 3D OAT by use of parallel programming […]
Apr, 9

A Performance Comparison of Different Graphics Processing Units Running Direct N-Body Simulations

Hybrid computational architectures based on the joint power of Central Processing Units and Graphic Processing Units (GPUs) are becoming popular and powerful hardware tools for a wide range of simulations in biology, chemistry, engineering, physics, etc.. In this paper we present a comparison of performance of various GPUs available on market when applied to the […]
Apr, 9

A GEMM interface and implementation on NVIDIA GPUs for multiple small matrices

We present an interface and an implementation of the General Matrix Multiply (GEMM) routine for multiple small matrices processed simultaneously on NVIDIA graphics processing units (GPUs). We focus on matrix sizes under 16. The implementation can be easily extended to larger sizes. For single precision matrices, our implementation is 30% to 600% faster than the […]
Apr, 8

pVOCL: Power-Aware Dynamic Placement and Migration in Virtualized GPU Environments

Power-hungry Graphics processing unit (GPU) accelerators are ubiquitous in high performance computing data centers today. GPU virtualization frameworks introduce new opportunities for effective management of GPU resources by decoupling them from application execution. However, power management of GPU-enabled server clusters faces significant challenges. The underlying system infrastructure shows complex power consumption characteristics depending on the […]
Apr, 8

A Hardware Multithreaded SpMV Kernel for the Convey HC-2ex

Applications exhibiting irregular behavior through poor memory locality have been a constant challenge for high-performance computing. Architectures supporting hardware multithreading (e.g. Tera MTA and Cray XMT) have been shown to deliver superior performance on such applications by masking memory latency. FPGAs have outperformed traditional architectures on applications that exhibit very large spatial locality and where […]
Apr, 8

Load Balancing in a Changing World: Dealing with Heterogeneity and Performance Variability

Fully utilizing the power of modern heterogeneous systems requires judiciously dividing work across all of the available computational devices. Existing approaches for partitioning work require offline training and generate fixed partitions that fail to respond to fluctuations in device performance that occur at run time. We present a novel dynamic approach to work partitioning that […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: