13775

Posts

Mar, 23

Massively Parallel Construction of the Cell Graph

Motion planning is an important and well-studied field of robotics. A typical approach to finding a route is to construct a cell graph representing a scene and then to find a path in such a graph. In this paper we present and analyze parallel algorithms for constructing the cell graph on a SIMD-like GPU processor. […]
Mar, 23

A Financial Benchmark for GPGPU Compilation

Commodity many-core hardware is now mainstream, driven in particular by the evolution of general purpose graphics programming units (GPGPUs), but parallel programming models are lagging behind in effectively exploiting the available application parallelism. There are two principal reasons. First, real-world applications often exhibit a rich composition of nested parallelism, whose statical extraction requires a set […]
Mar, 22

PTX2Kernel: Converting PTX Code into Compilable Kernels

GPUs are now widely used as high performance general purpose computing devices. More and more applications have achieved large speedups with one or more GPUs, and the number of GPU programs is growing fast. In certain situations, the high level CUDA C code of kernels is not available, but low level PTX code can be […]
Mar, 22

Speeding Up Computer Vision Applications on Mobile Computing Platforms

Computer vision (CV) is widely expected to be the next "Big Thing" in mobile computing. For example, Google has recently announced their project "Tango", a 5-inch Android phone containing highly customized hardware and software designed to track the full 3-dimensional motion of the device as you hold it while simultaneously creating a map of the […]
Mar, 22

Raising the Bar for Using GPUs in Software Packet Processing

Numerous recent research efforts have explored the use of Graphics Processing Units (GPUs) as accelerators for software-based routing and packet handling applications, typically demonstrating throughput several times higher than using legacy code on the CPU alone. In this paper, we explore a new hypothesis about such designs: For many such applications, the benefits arise less […]
Mar, 22

Evaluating kernels on Xeon Phi to accelerate Gysela application

This work describes the challenges presented by porting parts ofthe Gysela code to the Intel Xeon Phi coprocessor, as well as techniques used for optimization, vectorization and tuning that can be applied to other applications. We evaluate the performance of somegeneric micro-benchmark on Phi versus Intel Sandy Bridge. Several interpolation kernels useful for the Gysela […]
Mar, 22

Single stream parallelization of generalized LSTM-like RNNs on a GPU

Recurrent neural networks (RNNs) have shown outstanding performance on processing sequence data. However, they suffer from long training time, which demands parallel implementations of the training procedure. Parallelization of the training algorithms for RNNs are very challenging because internal recurrent paths form dependencies between two different time frames. In this paper, we first propose a […]
Mar, 20

CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication

Sparse matrix-vector multiplication (SpMV) is a fundamental building block for numerous applications. In this paper, we propose CSR5 (Compressed Sparse Row 5), a new storage format, which offers high-throughput SpMV on various platforms including CPUs, GPUs and Xeon Phi. First, the CSR5 format is insensitive to the sparsity structure of the input matrix. Thus the […]
Mar, 20

Interactive Illustrative Line Styles and Line Style Transfer Functions for Flow Visualization

We present a flexible illustrative line style model for the visualization of streamline data. Our model partitions view-oriented line strips into parallel bands whose basic visual properties can be controlled independently. We thus extend previous line stylization techniques specifically for visualization purposes by allowing the parametrization of these bands based on the local line data […]
Mar, 20

On learning optimized reaction diffusion processes for effective image restoration

For several decades, image restoration remains an active research topic in low-level computer vision and hence new approaches are constantly emerging. However, many recently proposed algorithms achieve state-of-the-art performance only at the expense of very high computation time, which clearly limits their practical relevance. In this work, we propose a simple but effective approach with […]
Mar, 20

The More We Share, The More We Have: Improving GPU performance through Register Sharing

Graphics Processing Units (GPUs) consisting of Streaming Multiprocessors (SMs) achieve high throughput by running a large number of threads and context switching among them to hide execution latencies. The amount of thread level parallelism that can be utilized depends on the number of resident threads on each of the SMs. The threads are typically structured […]
Mar, 20

Implementation of a Practical Distributed Calculation System with Browsers and JavaScript, and Application to Distributed Deep Learning

Deep learning can achieve outstanding results in various fields. However, it requires so significant computational power that graphics processing units (GPUs) and/or numerous computers are often required for the practical application. We have developed a new distributed calculation framework called "Sashimi" that allows any computer to be used as a distribution node only by accessing […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org