1358

Posts

Nov, 5

Performance Comparison of Graphics Processors to Reconfigurable Logic: A Case Study

A systematic approach to the comparison of the graphics processor (GPU) and reconfigurable logic is defined in terms of three throughput drivers. The approach is applied to five case study algorithms, characterized by their arithmetic complexity, memory access requirements, and data dependence, and two target devices: the nVidia GeForce 7900 GTX GPU and a Xilinx […]
Nov, 5

Exploiting frame-to-frame coherence for accelerating high-quality volume raycasting on graphics hardware

GPU-based raycasting offers an interesting alternative to conventional slice-based volume rendering due to the inherent flexibility and the high quality of the generated images. Recent advances in graphics hardware allow for the ray traversal and volume sampling to be executed on a per-fragment level completely on the GPU leading to interactive framerates. In this work […]
Nov, 5

GPUTeraSort: high performance graphics co-processor sorting for large database management

We present a novel external sorting algorithm using graphics processors (GPUs) on large databases composed of billions of records and wide keys. Our algorithm uses the data parallelism within a GPU along with task parallelism by scheduling some of the memory-intensive and compute-intensive threads on the GPU. Our new sorting architecture provides multiple memory interfaces […]
Nov, 4

2D/3D image registration on the GPU

We present a method that performs a rigid 2D/3D image registration efficiently on the Graphical Processing Unit (GPU). As one main contribution of this paper, we propose an efficient method for generating realistic DRRs that are visually similar to x-ray images. Therefore, we model some of the electronic post-processes of current x-ray C-arm-systems. As another […]
Nov, 4

Treecode and fast multipole method for N-body simulation with CUDA

Due to the variety and importance of applications of treecodes and FMM, the combination of algorithmic acceleration with hardware acceleration can have tremendous impact. Alas, programming these algorithms efficiently is no piece of cake. In this contribution, we aim to present GPU kernels for treecode and FMM in, as much as possible, an uncomplicated, accessible […]
Nov, 4

Simple dynamic LOD for geometry images

We present a new approach for dynamic LOD processing for geometry images (GIs) in the graphics processing unit (GPU). A GI mipmap is constructed from a scanned 3D model, then a mipmap selector map is created from the camera position’s information and the GI-mipmap. Using the mipmap selector map and the current camera position, some […]
Nov, 4

Hardware-Accelerated Volume Rendering for Real-Time Medical Data Visualization

Volumetric data rendering has become an important tool in various medical procedures as it allows the unbiased visualization of fine details of volumetric medical data (CT, MRI, fMRI). However, due to the large amount of computation involved, the rendering time increases dramatically as the size of the data set grows. This paper presents several acceleration […]
Nov, 4

CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units

BACKGROUND:The Smith-Waterman algorithm is one of the most widely used tools for searching biological sequence databases due to its high sensitivity. Unfortunately, the Smith-Waterman algorithm is computationally demanding, which is further compounded by the exponential growth of sequence databases. The recent emergence of many-core architectures, and their associated programming interfaces, provides an opportunity to accelerate […]
Nov, 4

Boosted Algorithms for Visual Object Detection on Graphics Processing Units

Nowadays, the use of machine learning methods for visual object detection has become widespread. Those methods are robust. They require an important processing power and a high memory bandwidth which becomes a handicap for real-time applications. The recent evolution of commodity PC computer graphics boards (GPU) has the potential to accelerate those algorithms.
Nov, 4

A Fast Implementation of the Octagon Abstract Domain on Graphics Hardware

We propose an efficient implementation of the Octagon Abstract Domain (OAD) on Graphics Processing Unit (GPU) by exploiting stream processing to speed-up OAD computations. OAD is a relational numerical abstract domain which approximates invariants as conjunctions of constraints of the form
Nov, 4

Fast tridiagonal solvers on the GPU

We study the performance of three parallel algorithms and their hybrid variants for solving tridiagonal linear systems on a GPU: cyclic reduction (CR), parallel cyclic reduction (PCR) and recursive doubling (RD). We develop an approach to measure, analyze, and optimize the performance of GPU programs in terms of memory access, computation, and control overhead. We […]
Nov, 4

Benchmarking GPUs to tune dense linear algebra

We present performance results for dense linear algebra using recent NVIDIA GPUs. Our matrix-matrix multiply routine (GEMM) runs up to 60% faster than the vendor’s implementation and approaches the peak of hardware capabilities. Our LU, QR and Cholesky factorizations achieve up to 80–90% of the peak GEMM rate. Our parallel LU running on two GPUs […]
Page 915 of 935« First...102030...913914915916917...920930...Last »

Recent source codes

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: