13785

Posts

Mar, 25

Energy-efficient Computing on Distributed GPUs using Dynamic Parallelism and GPU-controlled Communication

GPUs are widely used in high performance computing, due to their high computational power and high performance per Watt. Still, one of the main bottlenecks of GPU-accelerated cluster computing is the data transfer between distributed GPUs. This not only affects performance, but also power consumption. The most common way to utilize a GPU cluster is […]
Mar, 25

Analysis of illumination conditions at the lunar south pole using parallel computing techniques

In this Master Thesis an analysis of illumination conditions at the lunar south pole using parallel computing techniques is presented. Due to the small inclination (1.54o) of the lunar rotational axis with respect to the ecliptic plane and the topography of the lunar south pole, which allows long illumination periods, the study of illumination conditions […]
Mar, 25

The Feasibility of Using OpenCL Instead of OpenMP for Parallel CPU Programming

OpenCL, along with CUDA, is one of the main tools used to program GPGPUs. However, it allows running the same code on multi-core CPUs too, making it a rival for the long-established OpenMP. In this paper we compare OpenCL and OpenMP when developing and running compute-heavy code on a CPU. Both ease of programming and […]
Mar, 25

Data driven scheduling approach for the multi-node multi-GPU Cholesky decomposition

Recently large scale scientific computation on heterogeneous supercomputers equipped with accelerators is receiving attraction. However, traditional static job execution methods and memory management methods are insufficient in order to harness heterogeneous computing resources including memory efficiently, since they introduce larger data movement costs and lower resource usage. This paper takes the Cholesky decomposition computation, which […]
Mar, 23

Parallelization, Scalability, and Reproducibility in Next-Generation Sequencing Analysis

The analysis of next-generation sequencing (NGS) data is a major topic in bioinformatics: short reads obtained from DNA, the molecule encoding the genome of living organisms, are processed to provide insight into biological or medical questions. This thesis provides novel solutions to major topics within the analysis of NGS data, focusing on parallelization, scalability and […]
Mar, 23

Curracurrong: a stream processing system for distributed environments

Advances in technology have given rise to applications that are deployed on wireless sensor networks (WSNs), the cloud, and the Internet of things. There are many emerging applications, some of which include sensor-based monitoring, web traffic processing, and network monitoring. These applications collect large amount of data as an unbounded sequence of events and process […]
Mar, 23

GPU Kernels for High-Speed 4-Bit Astrophysical Data Processing

Interferometric radio telescopes often rely on computationally expensive O(N^2) correlation calculations; fortunately these computations map well to massively parallel accelerators such as low-cost GPUs. This paper describes the OpenCL kernels developed for the GPU based X-engine of a new hybrid FX correlator. Channelized data from the F-engine is supplied to the GPUs as 4-bit, offset-encoded […]
Mar, 23

Massively Parallel Construction of the Cell Graph

Motion planning is an important and well-studied field of robotics. A typical approach to finding a route is to construct a cell graph representing a scene and then to find a path in such a graph. In this paper we present and analyze parallel algorithms for constructing the cell graph on a SIMD-like GPU processor. […]
Mar, 23

A Financial Benchmark for GPGPU Compilation

Commodity many-core hardware is now mainstream, driven in particular by the evolution of general purpose graphics programming units (GPGPUs), but parallel programming models are lagging behind in effectively exploiting the available application parallelism. There are two principal reasons. First, real-world applications often exhibit a rich composition of nested parallelism, whose statical extraction requires a set […]
Mar, 22

PTX2Kernel: Converting PTX Code into Compilable Kernels

GPUs are now widely used as high performance general purpose computing devices. More and more applications have achieved large speedups with one or more GPUs, and the number of GPU programs is growing fast. In certain situations, the high level CUDA C code of kernels is not available, but low level PTX code can be […]
Mar, 22

Speeding Up Computer Vision Applications on Mobile Computing Platforms

Computer vision (CV) is widely expected to be the next "Big Thing" in mobile computing. For example, Google has recently announced their project "Tango", a 5-inch Android phone containing highly customized hardware and software designed to track the full 3-dimensional motion of the device as you hold it while simultaneously creating a map of the […]
Mar, 22

Raising the Bar for Using GPUs in Software Packet Processing

Numerous recent research efforts have explored the use of Graphics Processing Units (GPUs) as accelerators for software-based routing and packet handling applications, typically demonstrating throughput several times higher than using legacy code on the CPU alone. In this paper, we explore a new hypothesis about such designs: For many such applications, the benefits arise less […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: