5561

Posts

Sep, 7

Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators

We present a taxonomy and modular implementation approach for data-parallel accelerators, including the MIMD, vector-SIMD, subword-SIMD, SIMT, and vector-thread (VT) architectural design patterns. We have developed a new VT microarchitecture, Maven, based on the traditional vector-SIMD microarchitecture that is considerably simpler to implement and easier to program than previous VT designs. Using an extensive design-space […]
Sep, 7

Parallel implementation of conjugate gradient method on graphics processors

Nowadays GPUs become extremely promising multi/many-core architectures for a wide range of demanding applications. Basic features of these architectures include utilization of a large number of relatively simple processing units which operate in the SIMD fashion, as well as hardware supported, advanced multithreading. However, the utilization of GPUs in an every-day practice is still limited, […]
Sep, 7

Compiler-directed memory management for heterogeneous MPSoCs

Advances in semiconductor technique enable multiple processor cores to be integrated into a single chip. Heterogeneous multiprocessor system-on-a-chip (MPSoC) becomes important platforms to accelerate applications. However, compilation techniques for memory management on MPSoCs still lag behind. This paper presents an automatic memory management framework to orchestrate the data movement between local memory and off-chip memory. […]
Sep, 6

Bridging parallel and reconfigurable computing with multilevel PGAS and SHMEM+

Reconfigurable computing (RC) systems based on FPGAs are becoming an increasingly attractive solution to building parallel systems of the future. Applications targeting such systems have demonstrated superior performance and reduced energy consumption versus their traditional counterparts based on microprocessors. However, most of such work has been limited to small system sizes. Unlike traditional HPC systems, […]
Sep, 6

CUDA-based GPU Implementation of Hierarchical Belief Propagation for Fast Stereo Matching

Stereo matching based on the Markov random field model has a global optimization problem. Solutions of the problem can be inferred by the belief propagation (BP) algorithm. The BP algorithm effectively estimates global solutions, but it takes a very long time to calculate messages. In this paper, we implement the hierarchical BP algorithm on a […]
Sep, 6

Electromagnetic effects in capacitively coupled plasma simulated with a PIC-MCC darwin code

To increase the efficiency of the plasma assisted material processing with help of the capacitively coupled plasma discharge frequency of the driven field and spatial size of the modern devices tend to higher values. This can lead to a stronger influence of the electromagnetic effects, which in turn can affect the plasma uniformity, one of […]
Sep, 5

Virtual Rheoscopic Fluids

We present a visualization technique for simulated fluid dynamics data that visualizes the gradient of the velocity field in an intuitive way. Our work is inspired by rheoscopic particles, which are small, flat particles that, when suspended in fluid, align themselves with the shear of the flow. We adopt the physical principles of real rheoscopic […]
Sep, 5

Graphical future

The future of computing is something that is very much on the mind of nVidia CEO Jen-Hsun Huang, not least because he thinks his company is going to have a hand in it. As a maker of graphics processing units (GPUs), nVidia has had more of a walk-on role in the PC. If you want […]
Sep, 5

Fast Construction of SAH BVHs on the Intel Many Integrated Core (MIC) Architecture

We investigate how to efficiently build bounding volume hierarchies (BVHs) with surface area heuristic (SAH) on the Intel Many Integrated Core (MIC) Architecture. To achieve maximum performance, we use four key concepts: progressive 10-bit quantization to reduce cache footprint with negligible loss in BVH quality; an AoSoA data layout that allows efficient streaming and SIMD […]
Sep, 5

A CUDA-based parallel implementation of K-nearest neighbor algorithm

Recent developments in Graphics Processing Units (GPUs) have enabled inexpensive high performance computing for general-purpose applications. Due to GPU’s tremendous computing capability, it has emerged as the co-processor of the CPU to achieve a high overall throughput. CUDA programming model provides the programmers adequate C language like APIs to better exploit the parallel power of […]
Sep, 5

Real-Time Tone Mapping for High-Resolution HDR Images

High dynamic range rendering attempts to take an HDR image and produce a more realistic representation on a limited range computer monitor. Although several tone mapping operators have been proposed in recent years, no evaluation has yet been undertaken to explore which operator is more suitable for hardware implementation. In this paper, we begin with […]
Sep, 5

DUODECIM – a structure for point scan compression and rendering

In this paper we present a compression scheme for large point scans including per-point normals. For the encoding of such scans we introduce a particular type of closest sphere packing grids, the hexagonal close packing (HCP). HCP grids provide a structure for an optimal packing of 3D space, and for a given sampling error they […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: