6968

Posts

Jan, 11

Implementing Genetic Algorithms to CUDA Environment Using Data Parallelization

Computation methods of parallel problem solving using graphic processing units (GPUs) have attracted much research interests in recent years. Parallel computation can be applied to genetic algorithms (GAs) in terms of the evaluation process of individuals in a population. This paper describes yet another implementation method of GAs to the CUDA environment where CUDA is […]
Jan, 11

Parallel LDPC decoding using CUDA and OpenMP

Digital mobile communication technologies, such as next generation mobile communication and mobile TV, are rapidly advancing. Hardware designs to provide baseband processing of new protocol standards are being actively attempted, because of concurrently emerging multiple standards and diverse needs on device functions, hardware-only implementation may have reached a limit. To overcome this challenge, digital communication […]
Jan, 11

Efficient Model-based 3D Tracking of Hand Articulations using Kinect

We present a novel solution to the problem of recovering and tracking the 3D position, orientation and full articulation of a human hand from markerless visual observations obtained by a Kinect sensor. We treat this as an optimization problem, seeking for the hand model parameters that minimize the discrepancy between the appearance and 3D structure […]
Jan, 11

Massively Parallel GPU Computing of Continuum Robotic Dynamics

Continuum robots, with the capability of bending and extending at any point along their length mimic the abilities of an octopus arm or an elephant trunk. These manipulators present a number of exciting possibilities. While calculating a static solution for the system has been proven with certain models to produce satisfactory results [1], this approach […]
Jan, 11

A Nearest Neighbor Data Structure for Graphics Hardware

Nearest neighbor search is a core computational task in database systems and throughout data analysis. It is also a major computational bottleneck, and hence an enormous body of research has been devoted to data structures and algorithms for accelerating the task. Recent advances in graphics hardware provide tantalizing speedups on a variety of tasks and […]
Jan, 11

MetaBinG: Using GPUs to Accelerate Metagenomic Sequence Classification

Metagenomic sequence classification is a procedure to assign sequences to their source genomes. It is one of the important steps for metagenomic sequence data analysis. Although many methods exist, classification of high-throughput metagenomic sequence data in a limited time is still a challenge. We present here an ultra-fast metagenomic sequence classification system (MetaBinG) using graphic […]
Jan, 11

Petaflop biofluidics simulations on a two million-core system

We present a computational framework for multi-scale simulations of real-life biofluidic problems. The framework allows to simulate suspensions composed by hundreds of millions of bodies interacting with each other and with a surrounding fluid in complex geometries. We apply the methodology to the simulation of blood flow through the human coronary arteries with a spatial […]
Jan, 11

Power-performance comparison of single-task driven many-cores

Many-cores, processors with 100s of cores, are becoming increasingly popular in general-purpose computing, yet power is a limiting factor in their performance. In this paper, we compare the power and performance of two design points in the many-core processor domain. The XMT general-purpose processor provides significant runtime advantage on irregular parallel programs (e.g., graph algorithms). […]
Jan, 11

Real-time massively parallel processing of spectral optical coherence tomography data on graphics processing units

In this contribution we describe a specialised data processing system for Spectral Optical Coherence Tomography (SOCT) biomedical imaging which utilises massively parallel data processing on a low-cost, Graphics Processing Unit (GPU). One of the most significant limitations of SOCT is the data processing time on the main processor of the computer (CPU), which is generally […]
Jan, 11

High Precision Integer Multiplication with a GPU Using Strassen’s Algorithm with Multiple FFT Sizes

We have improved our prior implementation of Strassens algorithm for high performance multiplication of very large integers on a general purpose graphics processor (GPU). A combination of algorithmic and implementation optimizations result in a factor of up to 13.9 speed improvement over our previous work, running on an NVIDIA 295. We have also reoptimized the […]
Jan, 10

People detection method using graphics processing units for a mobile robot with an omnidirectional camera

This paper presents a novel vision system for people detection using an omnidirectional camera mounted on a mobile robot. In order to determine regions of interest (ROI), we compute a dense optical flow map using graphics processing units, which enable us to examine compliance with the ego-motion of the robot in a dynamic environment. Shape-based […]
Jan, 10

Generating, Optimizing, and Scheduling a Compiler Level Representation of Stream Parallelism

Stream parallelism is often cited as a powerful programming model for expressing parallel computation for multi-core and heterogeneous computers. It allows programmers to concisely describe the concurrency and communication requirements found in a program and it allows compilers and runtime systems to easily generate efficient code targeting parallel hardware. This type of stream parallelism is […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: