5219

Posts

Jun, 2

MapCG: writing parallel program portable between CPU and GPU

Graphics Processing Units (GPU) have been playing an important role in the general purpose computing market recently. The common approach to program GPU today is to write GPU specific code with low level GPU APIs such as CUDA. Although this approach can achieve very good performance, it raises serious portability issues: programmers are required to […]
May, 11

Whole-function vectorization

Data-parallel programming languages are an important component in today’s parallel computing landscape. Among those are domain-specific languages like shading languages in graphics (HLSL, GLSL, RenderMan, etc.) and “general-purpose” languages like CUDA or OpenCL. Current implementations of those languages on CPUs solely rely on multi-threading to implement parallelism and ignore the additional intra-core parallelism provided by […]
May, 10

Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processors

Modern processors are evolving into hybrid, heterogeneous processors with both CPU and GPU cores used for general purpose computation. Several languages such as Brook, CUDA, and more recently OpenCL are being developed to fully harness the potential of these processors. These languages typically involve the control code running on the CPU and the performance-critical, data-parallel […]
Apr, 11

Simulation of bevel gear cutting with GPGPUs-performance and productivity

The desire for general purpose computation on graphics processing units caused the advance of new programming paradigms, e.g. OpenCL C/C++, CUDA C or the PGI Accelerator Model. In this paper, we apply these programming approaches to the software KegelSpan for simulating bevel gear cutting. This engineering application simulates an important manufacturing process in the automotive […]
Apr, 7

Context-aware volume navigation

The trackball metaphor is exploited in many applications where volumetric data needs to be explored. Although it provides an intuitive way to inspect the overall structure of objects of interest, an in-detail inspection can be tedious – or when cavities occur even impossible. Therefore we propose a context-aware navigation technique for the exploration of volumetric […]
Apr, 7

Practical examples of GPU computing optimization principles

In this paper, we provide examples to optimize signal processing or visual computing algorithms written for SIMT-based GPU architectures. These implementations demonstrate the optimizations for CUDA or its successors OpenCL and DirectCompute. We discuss the effect and optimization principles of memory coalescing, bandwidth reduction, processor occupancy, bank conflict reduction, local memory elimination and instruction optimization. […]
Apr, 3

A Light-weight API for Portable Multicore Programming

Multicore nodes have become ubiquitous in just a few years. At the same time, writing portable parallel software for multicore nodes is extremely challenging. Widely available programming models such as OpenMP and Pthreads are not useful for devices such as graphics cards, and more flexible programming models such as RapidMind are only available commercially. OpenCL […]
Apr, 3

Mobile visual computing

Summary form only given. I will talk about camera phones, how you can use camera as a sensor that gives natural access to the information about the real world around you (mobile augmented reality) and how you can combine general computation capability to combine several input images into better or more interesting output images (mobile […]
Apr, 3

Energy consumption of Graphic Processing Units with respect to automotive use-cases

With the introduction of API’s like CUDA, Stream+ or OpenCL, modern Graphics Processing Units (GPU’s) can be easily employed for general purpose computing. Plus, their comparatively low price per GFLOP makes them interesting candidates for coprocessors in future embedded Electronic Control Units (ECUs). Yet, as car manufacturers thrive to reduce the Thermal Design Power (TDP) […]
Apr, 2

Throughput-Effective On-Chip Networks for Manycore Accelerators

As the number of cores and threads in manycore compute accelerators such as Graphics Processing Units (GPU) increases, so does the importance of on-chip interconnection network design. This paper explores throughput-effective network-on-chips (NoC) for future manycore accelerators that employ bulk-synchronous parallel (BSP) programming models such as CUDA and OpenCL. A hardware optimization is “throughput-effective” if […]
Apr, 2

MARC: A Many-Core Approach to Reconfigurable Computing

We present a Many-core Approach to Reconfigurable Computing (MARC), enabling efficient high-performance computing for applications expressed using parallel programming models such as OpenCL. The MARC system exploits abundant special FPGA resources such as distributed block memories and DSP blocks to implement complete single-chip high efficiency many-core micro architectures. The key benefits of MARC are that […]
Apr, 2

Real-time particle filtering with heuristics for 3D motion capture by monocular vision

Particle filtering is known as a robust approach for motion tracking by vision, at the cost of heavy computation in a high dimensional pose space. In this work, we describe a number of heuristics that we demonstrate to jointly improve robustness and real-time for motion capture. 3D human motion capture by monocular vision without markers […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: