7578

Posts

Apr, 25

Polymer Field-Theory Simulations on Graphics Processing Units

We report the first CUDA graphics-processing-unit (GPU) implementation of the polymer field-theoretic simulation framework for determining fully fluctuating expectation values of equilibrium properties for periodic and select aperiodic polymer systems. Our implementation is suitable both for self-consistent field theory (mean-field) solutions of the field equations, and for fully fluctuating simulations using the complex Langevin approach. […]
Apr, 25

The Bones Source-to-Source Compiler Manual

Recent advances in multi-core and many-core processors requires programmers to exploit an increasing amount of parallelism from their applications. Data parallel languages such as CUDA and OpenCL make it possible to take advantage of such processors, but still require a large amount of effort from programmers. To address the challenge of parallel programming, we introduce […]
Apr, 25

Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing

The Parboil benchmarks are a set of throughput computing applications useful for studying the performance of throughput computing architecture and compilers. The name comes from the culinary term for a partial cooking process, which represents our belief that useful throughput computing benchmarks must be "cooked", or preselected to implement a scalable algorithm with fine-grained parallel […]
Apr, 25

WebCL for Hardware-Accelerated Web Applications

Mobile devices, such as smartphones and tablets, now run full feature browsers capable of handling rich media and web content. The emergence of HTML5 makes the browser an ever more attractive platform for application developers. In addition, improvements in JavaScript engines are further shrinking the performance gap between native applications, typically written in C and […]
Apr, 25

Comparison of Different Parallel Implementaions of the 2+1-Dimensional KPZ Model and the 3-Dimensional KMC Model

We show that efficient simulations of the Kardar-Parisi-Zhang interface growth in 2 + 1 dimensions and of the 3-dimensional Kinetic Monte Carlo of thermally activated diffusion can be realized both on GPUs and modern CPUs. In this article we present results of different implementations on GPUs using CUDA and OpenCL and also on CPUs using […]
Apr, 25

Paraiso : An Automated Tuning Framework for Explicit Solvers of Partial Differential Equations

We propose Paraiso, a domain specific language embedded in functional programming language Haskell, for automated tuning of explicit solvers of partial differential equations (PDEs) on GPUs as well as multicore CPUs. In Paraiso, one can describe PDE solving algorithms succinctly using tensor equations notation. Hydrodynamic properties, interpolation methods and other building blocks are described in […]
Apr, 24

Real-time video breakup detection for multiple HD video streams on a single GPU

An important task in film and video preservation is the quality assessment of the content to be archived or reused out of the archive. This task, if done manually, is a straining and time consuming process, so it is highly recommended to automate this process as far as possible. In this paper, we show how […]
Apr, 23

Performance Degradation Analysis of GPU Kernels

Hardware accelerators (currently Graphical Processing Units or GPUs) are an important component in many existing high-performance computing solutions [5]. Their growth in variety and usage is expected to skyrocket [1] due to many reasons. First, GPUs offer impressive energy efficiencies [3]. Second, when properly programmed, they yield impressive speedups by allowing programmers to model their […]
Apr, 23

Tree Structured Analysis on GPU Power Study

Graphics Processing Units (GPUs) have emerged as a promising platform for parallel computation. With a large number of processor cores and abundant memory bandwidth, GPUs deliver substantial computation power. While providing high computation performance, a GPU consumes high power and needs sufficient power supplies and cooling systems. It is essential to institute an efficient mechanism […]
Apr, 23

High-Efficient Parallel CAVLC Encoders on Heterogeneous Multicore Architectures

This article presents two high-efficient parallel realizations of the context-based adaptive variable length coding (CAVLC) based on heterogeneous multicore processors. By optimizing the architecture of the CAVLC encoder, three kinds of dependences are eliminated or weaken, including the context-based data dependence, the memory accessing dependence and the control dependence. The CAVLC pipeline is divided into […]
Apr, 23

Parallel Surface Reconstruction for Particle-Based Fluids

This paper presents a novel method that improves the efficiency of high-quality surface reconstructions for particle-based fluids using Marching Cubes. By constructing the scalar field only in a narrow band around the surface, the computational complexity and the memory consumption scale with the fluid surface instead of the volume. Furthermore, a parallel implementation of the […]
Apr, 23

Memory Bandwidth Efficient Two-Dimensional Fast Fourier Transform Algorithm and Implementation for Large Problem Sizes

Prevailing VLSI trends point to a growing gap between the scaling of on-chip processing throughput and off-chip memory bandwidth. An efficient use of memory bandwidth must become a first-class design consideration in order to fully utilize the processing capability of highly concurrent processing platforms like FPGAs. In this paper, we present key aspects of this […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: