2530

Posts

Jan, 8

Self-Configuring Applications for Heterogeneous Systems: Program Composition and Optimization Using Cognitive Techniques

This paper describes several challenges facing programmers of future edge computing systems, the diverse many-core devices that will soon exemplify commodity mainstream systems. To call attention to programming challenges ahead, this paper focuses on the most complex of such architectures: integrated, power-conserving systems, inherently parallel and heterogeneous, with distributed address spaces. When programming such complex […]
Jan, 8

Metaprogramming GPUs with Sh

Sh is a high-level shading language whose “parser” is implemented as a C++ library. Sh programs run on the GPU but act like extensions of the host application. Sh can be used for single shaders, to implement complex multipass algorithms, or for general-purpose computation on GPUs.
Jan, 7

Combining computer vision and physics simulations using GPGPU

We present a system that uses the immense processing capabilities of graphics processors (GPUs) to enable a computer vision algorithm, such as stereo depth extraction, to drive a physics simulation in an interactive environment. This combination of processing has the potential to dramatically alter the way that people interact with computers through novel user interfaces […]
Jan, 7

Thousand core chips: a technology perspective

This paper presents the many-core architecture, with hundreds to thousands of small cores, to deliver unprecedented compute performance in an affordable power envelope. We discuss fine grain power management, memory bandwidth, on die networks, and system resiliency for the many-core system.
Jan, 7

FPGA acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods

BACKGROUND: Likelihood (ML)-based phylogenetic inference has become a popular method for estimating the evolutionary relationships among species based on genomic sequence data. This method is used in applications such as RAxML, GARLI, MrBayes, PAML, and PAUP. The Phylogenetic Likelihood Function (PLF) is an important kernel computation for this method. The PLF consists of a loop […]
Jan, 7

A Unified Runtime System for Heterogeneous Multi-core Architectures

Approaching the theoretical performance of heterogeneous multicore architectures, equipped with specialized accelerators, is a challenging issue. Unlike regular CPUs that can transparently access the whole global memory address range, accelerators usually embed local memory on which they perform all their computations using a specific instruction set. While many research efforts have been devoted to offloading […]
Jan, 7

JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA

A recent trend in mainstream desktop systems is the use of general-purpose graphics processor units (GPGPUs) to obtain order-of-magnitude performance improvements. CUDA has emerged as a popular programming model for GPGPUs for use by C/C++ programmers. Given the widespread use of modern object-oriented languages with managed runtimes like Java and C#, it is natural to […]
Jan, 7

Variants of Jump Flooding Algorithm for Computing Discrete Voronoi Diagrams

Jump flooding algorithm (JFA) is an interesting way to utilize the graphics processing unit to efficiently compute Voronoi diagrams and distance transforms in 2D discrete space. This paper presents three novel variants of JFA. They focus on different aspects of JFA: the first variant can further reduce the errors of JFA; the second variant can […]
Jan, 7

Power Consumption of GPUs from a Software Perspective

GPUs are now considered as serious challengers for high-performance computing solutions. They have power consumptions up to 300 W. This may lead to power supply and thermal dissipation problems in computing centers. In this article we investigate, using measurements, how and where modern GPUs are using energy during various computations in a CUDA environment.
Jan, 7

Data transformations enabling loop vectorization on multithreaded data parallel architectures

Loop vectorization, a key feature exploited to obtain high performance on Single Instruction Multiple Data (SIMD) vector architectures, is significantly hindered by irregular memory access patterns in the data stream. This paper describes data transformations that allow us to vectorize loops targeting massively multithreaded data parallel architectures. We present a mathematical model that captures loop-based […]
Jan, 7

Fast BVH Construction on GPUs

We present two novel parallel algorithms for rapidly constructing bounding volume hierarchies on manycore GPUs. The first uses a linear ordering derived from spatial Morton codes to build hierarchies extremely quickly and with high parallel scalability. The second is a top-down approach that uses the surface area heuristic (SAH) to build hierarchies optimized for fast […]
Jan, 7

Low latency photon mapping using block hashing

For hardware accelerated rendering, photon mapping is especially useful for simulating caustic lighting effects on non-Lambertian surfaces. However, an efficient hardware algorithm for the computation of the k nearest neighbours to a sample point is required.Existing algorithms are often based on recursive spatial subdivision techniques, such askd-trees. However, hardware implementation of a tree-based algorithm would […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org