3568

Posts

Apr, 2

Throughput-Effective On-Chip Networks for Manycore Accelerators

As the number of cores and threads in manycore compute accelerators such as Graphics Processing Units (GPU) increases, so does the importance of on-chip interconnection network design. This paper explores throughput-effective network-on-chips (NoC) for future manycore accelerators that employ bulk-synchronous parallel (BSP) programming models such as CUDA and OpenCL. A hardware optimization is “throughput-effective” if […]
Apr, 2

MARC: A Many-Core Approach to Reconfigurable Computing

We present a Many-core Approach to Reconfigurable Computing (MARC), enabling efficient high-performance computing for applications expressed using parallel programming models such as OpenCL. The MARC system exploits abundant special FPGA resources such as distributed block memories and DSP blocks to implement complete single-chip high efficiency many-core micro architectures. The key benefits of MARC are that […]
Apr, 2

Real-time particle filtering with heuristics for 3D motion capture by monocular vision

Particle filtering is known as a robust approach for motion tracking by vision, at the cost of heavy computation in a high dimensional pose space. In this work, we describe a number of heuristics that we demonstrate to jointly improve robustness and real-time for motion capture. 3D human motion capture by monocular vision without markers […]
Apr, 2

Parallel discrete wavelet transform using the Open Computing Language: a performance and portability study

The discrete wavelet transform (DWT) is a powerful signal processing technique used in the JPEG 2000 image compression standard. The multi-resolution sub-band encoding provided by DWT allows for higher compression ratios, avoids blocking artifacts and enables progressive transmission of images. However, these advantages come at the expense of additional computational complexity. Achieving real-time or interactive […]
Apr, 2

Parallel implementation of the Finite-Difference Time-Domain method in Open Computing Language

In this paper we evaluate the usability and performance of Open Computing Language (OpenCL) targeted for implementation of the Finite-Difference Time-Domain (FDTD) method. The simulation speed was compared to implementations based on alternative techniques of parallel processor programming. Moreover, the portability of OpenCL FDTD code between modern computing architectures was assessed. The average speed of […]
Apr, 2

Speeding-up Pearson Correlation Coefficient calculation on graphical processing units

Sample correlation coefficient is used widely for finding signal similarity in data processing, multimedia, pattern recognition and artificial intelligence applications. Pearson Correlation Coefficient is the most common measure for the correlation coefficient between discrete signals. Similarity search in huge pattern databases require a fast way of calculating the correlation coefficient between numerical vectors. In this […]
Apr, 2

GPU-Enabled AI

GPU-enabled AI is a subset of so- called general-purpose GPU computing (GPGPU). But it promises to be one of the fastest-growing subsets. The rise of cloud computing, recent high-powered graphics-chip releases by AMD’s competitor Nvidia, and the growing acceptance of the OpenCL programming platform have all converged to allow GPU-enabled AI to take off in […]
Apr, 2

Uncertainty-Aware Guided Volume Segmentation

Although direct volume rendering is established as a powerful tool for the visualization of volumetric data, efficient and reliable feature detection is still an open topic. Usually, a tradeoff between fast but imprecise classification schemes and accurate but time-consuming segmentation techniques has to be made. Furthermore, the issue of uncertainty introduced with the feature detection […]
Apr, 2

A characterization and analysis of PTX kernels

General purpose application development for GPUs (GPGPU) has recently gained momentum as a cost-effective approach for accelerating data- and compute-intensive applications. It has been driven by the introduction of C-based programming environments such as NVIDIA’s CUDA, OpenCL, and Intel’s Ct. While significant effort has been focused on developing and evaluating applications and software tools, comparatively […]
Apr, 2

Parallel computing with CUDA

Summary form only given. NVIDIA’s CUDA architecture provides a powerful platform for writing highly parallel programs. By providing simple abstractions for hierarchical thread organization, memories, and synchronization, the CUDA programming model allows programmers to write scalable programs without the burden of learning a multitude of new programming constructs. The CUDA architecture can support many languages […]
Apr, 1

Non-intrusive Performance Analysis of Parallel Hardware Accelerated Applications on Hybrid Architectures

New high performance computing (HPC) applications recently have to face scalability over an increasing number of nodes and the programming of special accelerator hardware. Hybrid composition of large computing systems leads to a new dimension in complexity of software development. This paper presents a novel approach to gain insight into accelerator interaction and utilization without […]
Apr, 1

Message Passing Interface support for the runtime adaptive multi-processor system-on-chip RAMPSoC

Parallel processor architectures are a promising solution to provide the required computing performance for current and future high performance applications. Certainly, the impact on the computational power of such a parallel computer system is related to the inherent parallelism of the algorithm to be implemented. The implementation of an algorithm onto a parallel computer architecture, […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: