2548

Posts

Jan, 9

Real-time object detection on CUDA

The aim of the research described in this article is to accelerate object detection in images and video sequences using graphics processors. It includes algorithmic modifications and adjustments of existing detectors, constructing variants of efficient implementations and evaluation comparing with efficient implementations on the CPUs. This article focuses on detection by statistical classifiers based on […]
Jan, 9

Evaluation and tuning of the Level 3 CUBLAS for graphics processors

The increase in performance of the last generations of graphics processors (GPUs) has made this class of platform a coprocessing tool with remarkable success in certain types of operations. In this paper we evaluate the performance of the Level 3 operations in CUBLAS, the implementation of BIAS for NVIDIA GPUs with unified architecture. From this […]
Jan, 9

Parallel programming for multimedia applications

Computing capabilities are continuing to increase with the availability of multi core and many core processors. The wide availability of multi core processors has made parallel programming possible for end user applications running on desktops, workstations, and mobile devices. While parallel hardware has become common, software that exploits parallel capabilities is just beginning to take […]
Jan, 9

A new approach to the lattice Boltzmann method for graphics processing units

Emerging many-core processors, like CUDA capable nVidia GPUs, are promising platforms for regular parallel algorithms such as the Lattice Boltzmann Method (LBM). Since the global memory for graphic devices shows high latency and LBM is data intensive, the memory access pattern is an important issue for achieving good performances. Whenever possible, global memory loads and […]
Jan, 9

High-throughput bayesian computing machine with reconfigurable hardware

We use reconfigurable hardware to construct a high throughput Bayesian computing machine (BCM) capable of evaluating probabilistic networks with arbitrary DAG (directed acyclic graph) topology. Our BCM achieves high throughput by exploiting the FPGA’s distributed memories and abundant hardware structures (such as long carry-chains and registers), which enables us to 1) develop an innovative memory […]
Jan, 9

Improving the Performance of Hyperspectral Image and Signal Processing Algorithms Using Parallel, Distributed and Specialized Hardware-Based Systems

Advances in sensor technology are revolutionizing the way remotely sensed data is collected, managed and analyzed. The incorporation of latest-generation sensors to airborne and satellite platforms is currently producing a nearly continual stream of high-dimensional data, and this explosion in the amount of collected information has rapidly created new processing challenges. For instance, hyperspectral signal […]
Jan, 9

Raising the level of many-core programming with compiler technology: meeting a grand challenge

Modern GPUs and CPUs are massively parallel, many-core processors. While application developers for these many-core chips are reporting 10X-100X speedup over sequential code on traditional microprocessors, the current practice of many-core programming based on OpenCL, CUDA, and OpenMP puts strain on software development, testing and support teams. According to the semiconductor industry roadmap, these processors […]
Jan, 8

Acceleration of FDTD mode solver by high-performance computing techniques

A two-dimensional (2D) compact finite-difference time-domain (FDTD) mode solver is developed based on wave equation formalism in combination with the matrix pencil method (MPM). The method is validated for calculation of both real guided and complex leaky modes of typical optical waveguides against the bench-mark finite-difference (FD) eigen mode solver. By taking advantage of the […]
Jan, 8

CBESW: sequence alignment on the Playstation 3

BACKGROUND: The exponential growth of available biological data has caused bioinformatics to be rapidly moving towards a data-intensive, computational science. As a result, the computational power needed by bioinformatics applications is growing exponentially as well. The recent emergence of accelerator technologies has made it possible to achieve an excellent improvement in execution time for many […]
Jan, 8

Four styles of parallel and net programming

This paper reviews the programming landscape for parallel and network computing systems, focusing on four styles of concurrent programming models, and example languages/libraries. The four styles correspond to four scales of the targeted systems. At the smallest coprocessor scale, Single Instruction Multiple Thread (SIMT) and Compute Unified Device Architecture (CUDA) are considered. Transactional memory is […]
Jan, 8

Quick-CULLIDE: fast inter- and intra-object collision culling using graphics hardware

We present a fast collision culling algorithm for performing inter- and intra-object collision detection among complex models using graphics hardware. Our algorithm is based on CULLIDE and performs visibility queries on the GPUs to eliminate a subset of geometric primitives that are not in close proximity. We present an extension to CULLIDE to perform intra-object […]
Jan, 8

A constant-space belief propagation algorithm for stereo matching

In this paper, we consider the problem of stereo matching using loopy belief propagation. Unlike previous methods which focus on the original spatial resolution, we hierarchically reduce the disparity search range. By fixing the number of disparity levels on the original resolution, our method solves the message updating problem in a time linear in the […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: