5719

Posts

Sep, 21

Multicore performance optimization using partner cores

As the push for parallelism continues to increase the number of cores on a chip, system design has become incredibly complex; optimizing for performance and power efficiency is now nearly impossible for the application programmer. To assist the programmer, a variety of techniques for optimizing performance and power at runtime have been developed, but many […]
Sep, 21

Mint: realizing CUDA performance in 3D stencil methods with annotated C

We present Mint, a programming model that enables the non-expert to enjoy the performance benefits of hand coded CUDA without becoming entangled in the details. Mint targets stencil methods, which are an important class of scientific applications. We have implemented the Mint programming model with a source-to-source translator that generates optimized CUDA C from traditional […]
Sep, 20

Scalable heterogeneous parallelism for atmospheric modeling and simulation

Heterogeneous multicore chipsets with many levels of parallelism are becoming increasingly common in high-performance computing systems. Effective use of parallelism in these new chipsets constitutes the challenge facing a new generation of large scale scientific computing applications. This study examines methods for improving the performance of two-dimensional and three-dimensional atmospheric constituent transport simulation on the […]
Sep, 20

Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors

MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program’s execution time. Today’s computer systems have tremendous […]
Sep, 20

Automatic abstraction and fault tolerance in cortical microachitectures

Recent advances in the neuroscientific understanding of the brain are bringing about a tantalizing opportunity for building synthetic machines that perform computation in ways that differ radically from traditional Von Neumann machines. These brain-like architectures, which are premised on our understanding of how the human neocortex computes, are highly fault-tolerant, averaging results over large numbers […]
Sep, 20

SRAM-DRAM hybrid memory with applications to efficient register files in fine-grained multi-threading

Large register files are common in highly multi-threaded architectures such as GPUs. This paper presents a hybrid memory design that tightly integrates embedded DRAM into SRAM cells with a main application to reducing area and power consumption of multi-threaded register files. In the hybrid memory, each SRAM cell is augmented with multiple DRAM cells so […]
Sep, 20

Brief announcement: better speedups for parallel max-flow

We present a parallel solution to the Maximum-Flow (Max-Flow) problem, suitable for a modern many-core architecture. We show that by starting from a PRAM algorithm, following an established "programmer’s workflow" and targeting XMT, a PRAM-inspired many-core architecture, we achieve significantly higher speed-ups than previous approaches. Comparison with the fastest known serial max-flow implementation on a […]
Sep, 20

Hermes: an integrated CPU/GPU microarchitecture for IP routing

With the constantly increasing Internet traffic and fast changing network protocols, future routers have to simultaneously satisfy the requirements for throughput, QoS, flexibility, and scalability. In this work, we propose a novel integrated CPU/GPU microarchitecture, Hermes, for QoS-aware high speed routing. We also develop a new thread scheduling mechanism, which significantly improves all QoS metrics.
Sep, 20

CuMAPz: a tool to analyze memory access patterns in CUDA

CUDA programming model provides a simple interface to program on GPUs, but tuning GPGPU applications for high performance is still quite challenging. Programmers need to consider several architectural details, and small changes in source code, especially on memory access pattern, affect performance significantly. This paper presents CuMAPz, a tool to compare the memory performance of […]
Sep, 20

EFFEX: an embedded processor for computer vision based feature extraction

The deployment of computer vision algorithms in mobile applications is growing at a rapid pace. A primary component of the computer vision software pipeline is feature extraction, which identifies and encodes relevant image features. We present an embedded heterogeneous multicore design named EFFEX that incorporates novel functional units and memory architecture support, making it capable […]
Sep, 20

Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework

Driven by the emergence of GPUs as a major player in high performance computing and the rapidly growing popularity of cloud environments, GPU instances are now being offered by cloud providers. The use of GPUs in a cloud environment, however, is still at initial stages, and the challenge of making GPU a true shared resource […]
Sep, 20

Acceleration of genetic algorithms for sudoku solution on many-core processors

In this paper, we use the problem of solving Sudoku puzzles to demonstrate the possibility of achieving practical processing time through the use of many-core processors for parallel processing in the application of genetic computation. To increase accuracy, we propose a genetic operation that takes building-block linkage into account. As a parallel processing model for […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: