13778

Posts

Mar, 23

Parallelization, Scalability, and Reproducibility in Next-Generation Sequencing Analysis

The analysis of next-generation sequencing (NGS) data is a major topic in bioinformatics: short reads obtained from DNA, the molecule encoding the genome of living organisms, are processed to provide insight into biological or medical questions. This thesis provides novel solutions to major topics within the analysis of NGS data, focusing on parallelization, scalability and […]
Mar, 23

Curracurrong: a stream processing system for distributed environments

Advances in technology have given rise to applications that are deployed on wireless sensor networks (WSNs), the cloud, and the Internet of things. There are many emerging applications, some of which include sensor-based monitoring, web traffic processing, and network monitoring. These applications collect large amount of data as an unbounded sequence of events and process […]
Mar, 23

GPU Kernels for High-Speed 4-Bit Astrophysical Data Processing

Interferometric radio telescopes often rely on computationally expensive O(N^2) correlation calculations; fortunately these computations map well to massively parallel accelerators such as low-cost GPUs. This paper describes the OpenCL kernels developed for the GPU based X-engine of a new hybrid FX correlator. Channelized data from the F-engine is supplied to the GPUs as 4-bit, offset-encoded […]
Mar, 23

Massively Parallel Construction of the Cell Graph

Motion planning is an important and well-studied field of robotics. A typical approach to finding a route is to construct a cell graph representing a scene and then to find a path in such a graph. In this paper we present and analyze parallel algorithms for constructing the cell graph on a SIMD-like GPU processor. […]
Mar, 23

A Financial Benchmark for GPGPU Compilation

Commodity many-core hardware is now mainstream, driven in particular by the evolution of general purpose graphics programming units (GPGPUs), but parallel programming models are lagging behind in effectively exploiting the available application parallelism. There are two principal reasons. First, real-world applications often exhibit a rich composition of nested parallelism, whose statical extraction requires a set […]
Mar, 22

PTX2Kernel: Converting PTX Code into Compilable Kernels

GPUs are now widely used as high performance general purpose computing devices. More and more applications have achieved large speedups with one or more GPUs, and the number of GPU programs is growing fast. In certain situations, the high level CUDA C code of kernels is not available, but low level PTX code can be […]
Mar, 22

Speeding Up Computer Vision Applications on Mobile Computing Platforms

Computer vision (CV) is widely expected to be the next "Big Thing" in mobile computing. For example, Google has recently announced their project "Tango", a 5-inch Android phone containing highly customized hardware and software designed to track the full 3-dimensional motion of the device as you hold it while simultaneously creating a map of the […]
Mar, 22

Raising the Bar for Using GPUs in Software Packet Processing

Numerous recent research efforts have explored the use of Graphics Processing Units (GPUs) as accelerators for software-based routing and packet handling applications, typically demonstrating throughput several times higher than using legacy code on the CPU alone. In this paper, we explore a new hypothesis about such designs: For many such applications, the benefits arise less […]
Mar, 22

Evaluating kernels on Xeon Phi to accelerate Gysela application

This work describes the challenges presented by porting parts ofthe Gysela code to the Intel Xeon Phi coprocessor, as well as techniques used for optimization, vectorization and tuning that can be applied to other applications. We evaluate the performance of somegeneric micro-benchmark on Phi versus Intel Sandy Bridge. Several interpolation kernels useful for the Gysela […]
Mar, 22

Single stream parallelization of generalized LSTM-like RNNs on a GPU

Recurrent neural networks (RNNs) have shown outstanding performance on processing sequence data. However, they suffer from long training time, which demands parallel implementations of the training procedure. Parallelization of the training algorithms for RNNs are very challenging because internal recurrent paths form dependencies between two different time frames. In this paper, we first propose a […]
Mar, 20

CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication

Sparse matrix-vector multiplication (SpMV) is a fundamental building block for numerous applications. In this paper, we propose CSR5 (Compressed Sparse Row 5), a new storage format, which offers high-throughput SpMV on various platforms including CPUs, GPUs and Xeon Phi. First, the CSR5 format is insensitive to the sparsity structure of the input matrix. Thus the […]
Mar, 20

Interactive Illustrative Line Styles and Line Style Transfer Functions for Flow Visualization

We present a flexible illustrative line style model for the visualization of streamline data. Our model partitions view-oriented line strips into parallel bands whose basic visual properties can be controlled independently. We thus extend previous line stylization techniques specifically for visualization purposes by allowing the parametrization of these bands based on the local line data […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: