1674

Posts

Nov, 20

Regular Lattice and Small-World Spin Model Simulations Using CUDA and GPUs

Data-parallel accelerator devices such as Graphical Processing Units (GPUs) are providing dramatic performance improvements over even multi-core CPUs for lattice-oriented applications in computational physics. Models such as the Ising and Potts models continue to play a role in investigating phase transitions on small-world and scale-free graph structures. These models are particularly well-suited to the performance […]
Nov, 20

An instruction-systolic programmable shader architecture for multi-threaded 3D graphics processing

In order to guarantee both performance and programmability demands in 3D graphics applications, vector and multithreaded SIMD architectures have been employed in recent graphics processing units. This paper introduces a novel instruction-systolic array architecture, which transfers an instruction stream in a pipelined fashion to efficiently share the expensive functional resources of a graphics processor. Specifically, […]
Nov, 20

A middleware for efficient stream processing in CUDA

This paper presents a middleware capable of out-of-order execution of kernels and data transfers for efficient stream processing in the compute unified device architecture (CUDA). Our middleware runs on the CUDA-compatible graphics processing unit (GPU). Using the middleware, application developers are allowed to easily overlap kernel computation with data transfer between the main memory and […]
Nov, 20

Simulating a P system based efficient solution to SAT by using GPUs

P systems are inherently parallel and non-deterministic theoretical computing devices defined inside the field of Membrane Computing. Many P system simulators have been presented in this area, but they are inefficient since they can not handle the parallelism of these devices. Nowadays, we are witnessing the consolidation of the GPUs as a parallel framework to […]
Nov, 20

Simulation of one-layer shallow water systems on multicore and CUDA architectures

The numerical solution of shallow water systems is useful for several applications related to geophysical flows, but the big dimensions of the domains suggests the use of powerful accelerators to obtain numerical results in reasonable times. This paper addresses how to speed up the numerical solution of a first order well-balanced finite volume scheme for […]
Nov, 20

Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort

Sort is a fundamental kernel used in many database operations. In-memory sorts are now feasible; sort performance is limited by compute flops and main memory bandwidth rather than I/O. In this paper, we present a competitive analysis of comparison and non-comparison based sorting algorithms on two modern architectures – the latest CPU and GPU architectures. […]
Nov, 20

SBLOCK: A Framework for Efficient Stencil-Based PDE Solvers on Multi-core Platforms

We present a new software framework for the implementation of applications that use stencil computations on block-structured grids to solve partial differential equations. A key feature of the framework is the extensive use of automatic source code generation which is used to achieve high performance on a range of leading multi-core processors. Results are presented […]
Nov, 20

A GPGPU Transparent Virtualization Component for High Performance Computing Clouds

The GPU Virtualization Service (gVirtuS) presented in this work tries to fill the gap between in-house hosted computing clusters, equipped with GPGPUs devices, and pay-for-use high performance virtual clusters deployed via public or private computing clouds. gVirtuS allows an instanced virtual machine to access GPGPUs in a transparent and hypervisor independent way, with an overhead […]
Nov, 19

Active Structured Learning for High-Speed Object Detection

High-speed smooth and accurate visual tracking of objects in arbitrary, unstructured environments is essential for robotics and human motion analysis. However, building a system that can adapt to arbitrary objects and a wide range of lighting conditions is a challenging problem, especially if hard real-time constraints apply like in robotics scenarios. In this work, we […]
Nov, 19

PantaRay: fast ray-traced occlusion caching of massive scenes

We describe the architecture of a novel system for precomputing sparse directional occlusion caches. These caches are used for accelerating a fast cinematic lighting pipeline that works in the spherical harmonics domain. The system was used as a primary lighting technology in the movie Avatar, and is able to efficiently handle massive scenes of unprecedented […]
Nov, 19

Parallel option pricing with Fourier space time-stepping method on graphics processing units

With the evolution of graphics processing units (GPUs) into powerful and cost-efficient computing architectures, their range of application has expanded tremendously, especially in the area of computational finance. Current research in the area, however, is limited both in terms of the type of options priced and the complexity of stock price models. This paper presents […]
Nov, 19

Simulation of P systems with active membranes on CUDA

P systems or Membrane Systems provide a high-level computational modelling framework that combines the structure and dynamic aspects of biological systems in a relevant and understandable way. They are inherently parallel and non-deterministic computing devices. In this article, we discuss the motivation, design principles and key of the implementation of a simulator for the class […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org