2525

Posts

Jan, 7

A Unified Runtime System for Heterogeneous Multi-core Architectures

Approaching the theoretical performance of heterogeneous multicore architectures, equipped with specialized accelerators, is a challenging issue. Unlike regular CPUs that can transparently access the whole global memory address range, accelerators usually embed local memory on which they perform all their computations using a specific instruction set. While many research efforts have been devoted to offloading […]
Jan, 7

JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA

A recent trend in mainstream desktop systems is the use of general-purpose graphics processor units (GPGPUs) to obtain order-of-magnitude performance improvements. CUDA has emerged as a popular programming model for GPGPUs for use by C/C++ programmers. Given the widespread use of modern object-oriented languages with managed runtimes like Java and C#, it is natural to […]
Jan, 7

Variants of Jump Flooding Algorithm for Computing Discrete Voronoi Diagrams

Jump flooding algorithm (JFA) is an interesting way to utilize the graphics processing unit to efficiently compute Voronoi diagrams and distance transforms in 2D discrete space. This paper presents three novel variants of JFA. They focus on different aspects of JFA: the first variant can further reduce the errors of JFA; the second variant can […]
Jan, 7

Power Consumption of GPUs from a Software Perspective

GPUs are now considered as serious challengers for high-performance computing solutions. They have power consumptions up to 300 W. This may lead to power supply and thermal dissipation problems in computing centers. In this article we investigate, using measurements, how and where modern GPUs are using energy during various computations in a CUDA environment.
Jan, 7

Data transformations enabling loop vectorization on multithreaded data parallel architectures

Loop vectorization, a key feature exploited to obtain high performance on Single Instruction Multiple Data (SIMD) vector architectures, is significantly hindered by irregular memory access patterns in the data stream. This paper describes data transformations that allow us to vectorize loops targeting massively multithreaded data parallel architectures. We present a mathematical model that captures loop-based […]
Jan, 7

Fast BVH Construction on GPUs

We present two novel parallel algorithms for rapidly constructing bounding volume hierarchies on manycore GPUs. The first uses a linear ordering derived from spatial Morton codes to build hierarchies extremely quickly and with high parallel scalability. The second is a top-down approach that uses the surface area heuristic (SAH) to build hierarchies optimized for fast […]
Jan, 7

Low latency photon mapping using block hashing

For hardware accelerated rendering, photon mapping is especially useful for simulating caustic lighting effects on non-Lambertian surfaces. However, an efficient hardware algorithm for the computation of the k nearest neighbours to a sample point is required.Existing algorithms are often based on recursive spatial subdivision techniques, such askd-trees. However, hardware implementation of a tree-based algorithm would […]
Jan, 6

Message passing on data-parallel architectures

This paper explores the challenges in implementing a message passing interface usable on systems with data-parallel processors. As a case study, we design and implement the “DCGN” API on NVIDIA GPUs that is similar to MPI and allows full access to the underlying architecture. We introduce the notion of data-parallel thread-groups as a way to […]
Jan, 6

Accelerating phase unwrapping and affine transformations for optical quadrature microscopy using CUDA

Optical Quadrature Microscopy (OQM) is a process which uses phase data to capture information about the sample being studied. OQM is part of an imaging framework developed by the Optical Science Laboratory at Northeastern University. In one particular application of interest, the framework is used to extract phase information from the image of an embryo […]
Jan, 6

Hardware-accelerated parallel non-photorealistic volume rendering

Non-photorealistic rendering can be used to illustrate subtle spatial relationships that might not be visible with more realistic rendering techniques. We present a parallel hardware-accelerated rendering technique, making extensive use of multi-texturing and paletted textures, for the interactive non-photorealistic visualization of scalar volume data. With this technique, we can render a 512x512x512 volume using non-photorealistic […]
Jan, 6

Interactive volume illustration

In this paper we describe non-photorealistic rendering techniques for volumetric data sets. First, we outline an automatic approach that generates line drawings to illustrate such data sets and to augment traditional volume rendering techniques. For a number of seed points that are placed appropriately to represent selected volume structures curvature lines are traced and encoded […]
Jan, 6

Synthetic Aperture Radar Processing with GPGPU

This article focuses on methodologies with recurrent use to code examples that try to couple with the flow of the main steps of the SAR processing. The possibility to be comprehensive was prevented by the wide scenario of variations of the focusing algorithm as well as the spread of applications. The reader should look at […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: