5568

Posts

Sep, 7

Operating systems must support GPU abstractions

This paper argues that lack of OS support for GPU abstractions fundamentally limits the usability of GPUs in many application domains. OSes offer abstractions for most common resources such as CPUs, input devices, and file systems. In contrast, OSes currently hide GPUs behind an awkward ioctl interface, shifting the burden for abstractions onto user libraries […]
Sep, 7

A code-based analytical approach for using separate device coprocessors in computing systems

Special hardware accelerators like FPGAs and GPUs are commonly introduced into a computing system as a separate device. Consequently, the accelerator and the host system do not share a common memory. Sourcing out the data to the additional hardware thus introduces a communication penalty. Based on a combination of a program’s source code and execution […]
Sep, 7

GPU-based asynchronous particle swarm optimization

This paper describes our latest implementation of Particle Swarm Optimization (PSO) with simple ring topology for modern Graphic Processing Units (GPUs). To achieve both the fastest execution time and the best performance, we designed a parallel version of the algorithm, as fine-grained as possible, without introducing explicit synchronization mechanisms among the particles’ evolution processes. The […]
Sep, 7

The future of microprocessors

Energy efficiency is the new fundamental limiter of processor performance, way beyond numbers of processors. Microprocessors-single-chip computers-are the building blocks of the information world. Their performance has grown 1,000-fold over the past 20 years, driven by transistor speed and energy scaling, as well as by microarchitecture advances that exploited the transistor density gains from Moore’s […]
Sep, 7

Energy-efficient mechanisms for managing thread context in throughput processors

Modern graphics processing units (GPUs) use a large number of hardware threads to hide both function unit and memory access latency. Extreme multithreading requires a complicated thread scheduler as well as a large register file, which is expensive to access both in terms of energy and latency. We present two complementary techniques for reducing energy […]
Sep, 7

FPGA vs. multi-core CPUs vs. GPUs: hands-on experience with a sorting application

Currently there are several interesting alternatives for low-cost high-performance computing. We report here our experiences with an N-gram extraction and sorting problem, originated in the design of a real-time network intrusion detection system. We have considered FPGAs, multi-core CPUs in symmetric multi-CPU machines and GPUs and have created implementations for each of these platforms. After […]
Sep, 7

Parallel packet classification using GPU co-processors

In the domain of network security, packet filtering for classification purposes is of significant interest. Packet classification provides a mechanism for understanding the composition of packet streams arriving at distinct network interfaces, and is useful in diagnosing threats and uncovering vulnerabilities so as to maximise data integrity and system security. Traditional packet classifiers, such as […]
Sep, 7

Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators

We present a taxonomy and modular implementation approach for data-parallel accelerators, including the MIMD, vector-SIMD, subword-SIMD, SIMT, and vector-thread (VT) architectural design patterns. We have developed a new VT microarchitecture, Maven, based on the traditional vector-SIMD microarchitecture that is considerably simpler to implement and easier to program than previous VT designs. Using an extensive design-space […]
Sep, 7

Parallel implementation of conjugate gradient method on graphics processors

Nowadays GPUs become extremely promising multi/many-core architectures for a wide range of demanding applications. Basic features of these architectures include utilization of a large number of relatively simple processing units which operate in the SIMD fashion, as well as hardware supported, advanced multithreading. However, the utilization of GPUs in an every-day practice is still limited, […]
Sep, 7

Compiler-directed memory management for heterogeneous MPSoCs

Advances in semiconductor technique enable multiple processor cores to be integrated into a single chip. Heterogeneous multiprocessor system-on-a-chip (MPSoC) becomes important platforms to accelerate applications. However, compilation techniques for memory management on MPSoCs still lag behind. This paper presents an automatic memory management framework to orchestrate the data movement between local memory and off-chip memory. […]
Sep, 6

Bridging parallel and reconfigurable computing with multilevel PGAS and SHMEM+

Reconfigurable computing (RC) systems based on FPGAs are becoming an increasingly attractive solution to building parallel systems of the future. Applications targeting such systems have demonstrated superior performance and reduced energy consumption versus their traditional counterparts based on microprocessors. However, most of such work has been limited to small system sizes. Unlike traditional HPC systems, […]
Sep, 6

CUDA-based GPU Implementation of Hierarchical Belief Propagation for Fast Stereo Matching

Stereo matching based on the Markov random field model has a global optimization problem. Solutions of the problem can be inferred by the belief propagation (BP) algorithm. The BP algorithm effectively estimates global solutions, but it takes a very long time to calculate messages. In this paper, we implement the hierarchical BP algorithm on a […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: