5822

Posts

Sep, 30

Real-Time Handling of GPU Interrupts in LITMUSRT

Graphics processing units (GPUs) are becoming increasingly important in today’s platforms as their increased generality allows for them to be used as powerful co-processors. However, unlike standard CPUs, GPUs are treated as I/O devices and require the use of interrupts to facilitate communication with the CPU. Interrupts cause delays in the execution of real-time tasks, […]
Sep, 30

Enhancing Data Locality for Dynamic Simulations through Asynchronous Data Transformations and Adaptive Control

Many dynamic simulation programs contain complex, irregular memory reference patterns, and require runtime optimizations to enhance data locality. Current approaches periodically stop the execution of an application to reorder the computation or data based on the current program state to improve the data locality for the next period of execution. In this work, we examine […]
Sep, 30

Stack-less SIMT reconvergence at low cost

Parallel architectures following the SIMT model such as GPUs benefit from application regularity by issuing concurrent threads running in lockstep on SIMD units. As threads take different paths across the control-flow graph, lockstep execution is partially lost, and must be regained whenever possible in order to maximize the occupancy of SIMD units. In this paper, […]
Sep, 30

A PTX Code Generator for LLVM

Today’s GPGPU architectures and corresponding high level programming languages like CUDA replace the traditionally restricted GPU pipelines. Proprietary compilers allow to translate these languages into native GPU assembly. Unfortunately, these compilers are non-customizable and restricted to static compilation. High performant application currently require particular manual optimizations. To overcome these cumbersome manual optimizations, this thesis develops […]
Sep, 30

Adaptable Two-Dimension Sliding Windows on NVIDIA GPUs with Runtime Compilation

For some classes of problems, NVIDIA CUDA abstraction and hardware properties combine with problem characteristics to limit the specific problem instances that can be effectively accelerated. As a real-world example, a twodimensional correlation-based template-matching MATLAB application is considered. While this problem has a well known solution for the common case of linear image filtering-small fixed […]
Sep, 30

CBench: Analyzing Compute Performance for Modern NVIDIA and AMD GPUs

General purpose GPU computation is a fast growing ?eld with a variety of applications. For maximum performance, though, mapping high-level parallel algorithms to vendor hardware requires a solid grasp of both the algorithm’s computational requirements and the microarchitectural limitations of the GPU. This work aims to explore the performance of high and low arithmetic intensity […]
Sep, 30

FATSEA-An Architectural Simulator for General Purpose Computing on GPUs

We present FATSEA, a functional and performance evaluation simulator written in C++ to handle kernels written in the CUDA programming language aimed for GPGPU computing. FATSEA takes a Parallel Thread eXecution (PTX ) code as input, which is a device independent code format generated by the Nvidia CUDA compiler, to validate results and estimate performance […]
Sep, 30

Translating GPU binaries to tiered SIMD architectures with Ocelot

Parallel Thread Execution ISA (PTX) is a virtual instruction set used by NVIDIA GPUs that explicitly expresses hierarchical MIMD and SIMD style parallelism in an application. In such a programming model, the programmer and compiler are left with the not trivial, but not impossible, task of composing applications from parallel algorithms and data structures. Once […]
Sep, 30

Accelerating Geospatial Analysis on GPUs using CUDA

Inverse distance weighting (IDW) interpolation and viewshed are two popular algorithms for geospatial analysis. IDW interpolation assigns geographical values to unknown spatial points by using values from a usually scattered set of known points, and viewshed identifies the cells in a spatial raster that can be seen by observers. Although the implementations of both algorithms […]
Sep, 30

Accelerating Foreign-Key Joins using Asymmetric Memory Channels

Indexed Foreign-Key Joins expose a very asymmetric access pattern: the Foreign-Key Index is sequentially scanned whilst the Primary-Key table is target of many quasi-random lookups which is the dominant cost factor. To reduce the costs of the random lookups the fact-table can be (re-) partitioned at runtime to increase access locality on the dimension table, […]
Sep, 30

Accelerating data mining workloads: current approaches and future challenges in system architecture design

Conventional systems based on general-purpose processors cannot keep pace with the exponential increase in the generation and collection of data. It is therefore important to explore alternative architectures that can provide the computational capabilities required to analyze ever-growing datasets. Programmable graphics processing units (GPUs) offer computational capabilities that surpass even high-end multi-core central processing units […]
Sep, 30

A Polyphase Filter For GPUs And Multi-Core Processors

Radio astronomy is a subfield of astronomy that studies celestial objects at radio frequencies. Unlike visible light, these radio signals are not blocked by earth’s atmosphere, making it possible to detect them from the ground. Radio emissions have been observed from a number of celestial bodies, including stars and galaxies. Some celestial bodies that can […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: