7576

Posts

Apr, 25

Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing

The Parboil benchmarks are a set of throughput computing applications useful for studying the performance of throughput computing architecture and compilers. The name comes from the culinary term for a partial cooking process, which represents our belief that useful throughput computing benchmarks must be "cooked", or preselected to implement a scalable algorithm with fine-grained parallel […]
Apr, 25

WebCL for Hardware-Accelerated Web Applications

Mobile devices, such as smartphones and tablets, now run full feature browsers capable of handling rich media and web content. The emergence of HTML5 makes the browser an ever more attractive platform for application developers. In addition, improvements in JavaScript engines are further shrinking the performance gap between native applications, typically written in C and […]
Apr, 25

Comparison of Different Parallel Implementaions of the 2+1-Dimensional KPZ Model and the 3-Dimensional KMC Model

We show that efficient simulations of the Kardar-Parisi-Zhang interface growth in 2 + 1 dimensions and of the 3-dimensional Kinetic Monte Carlo of thermally activated diffusion can be realized both on GPUs and modern CPUs. In this article we present results of different implementations on GPUs using CUDA and OpenCL and also on CPUs using […]
Apr, 25

Paraiso : An Automated Tuning Framework for Explicit Solvers of Partial Differential Equations

We propose Paraiso, a domain specific language embedded in functional programming language Haskell, for automated tuning of explicit solvers of partial differential equations (PDEs) on GPUs as well as multicore CPUs. In Paraiso, one can describe PDE solving algorithms succinctly using tensor equations notation. Hydrodynamic properties, interpolation methods and other building blocks are described in […]
Apr, 24

Real-time video breakup detection for multiple HD video streams on a single GPU

An important task in film and video preservation is the quality assessment of the content to be archived or reused out of the archive. This task, if done manually, is a straining and time consuming process, so it is highly recommended to automate this process as far as possible. In this paper, we show how […]
Apr, 23

Performance Degradation Analysis of GPU Kernels

Hardware accelerators (currently Graphical Processing Units or GPUs) are an important component in many existing high-performance computing solutions [5]. Their growth in variety and usage is expected to skyrocket [1] due to many reasons. First, GPUs offer impressive energy efficiencies [3]. Second, when properly programmed, they yield impressive speedups by allowing programmers to model their […]
Apr, 23

Tree Structured Analysis on GPU Power Study

Graphics Processing Units (GPUs) have emerged as a promising platform for parallel computation. With a large number of processor cores and abundant memory bandwidth, GPUs deliver substantial computation power. While providing high computation performance, a GPU consumes high power and needs sufficient power supplies and cooling systems. It is essential to institute an efficient mechanism […]
Apr, 23

High-Efficient Parallel CAVLC Encoders on Heterogeneous Multicore Architectures

This article presents two high-efficient parallel realizations of the context-based adaptive variable length coding (CAVLC) based on heterogeneous multicore processors. By optimizing the architecture of the CAVLC encoder, three kinds of dependences are eliminated or weaken, including the context-based data dependence, the memory accessing dependence and the control dependence. The CAVLC pipeline is divided into […]
Apr, 23

Parallel Surface Reconstruction for Particle-Based Fluids

This paper presents a novel method that improves the efficiency of high-quality surface reconstructions for particle-based fluids using Marching Cubes. By constructing the scalar field only in a narrow band around the surface, the computational complexity and the memory consumption scale with the fluid surface instead of the volume. Furthermore, a parallel implementation of the […]
Apr, 23

Memory Bandwidth Efficient Two-Dimensional Fast Fourier Transform Algorithm and Implementation for Large Problem Sizes

Prevailing VLSI trends point to a growing gap between the scaling of on-chip processing throughput and off-chip memory bandwidth. An efficient use of memory bandwidth must become a first-class design consideration in order to fully utilize the processing capability of highly concurrent processing platforms like FPGAs. In this paper, we present key aspects of this […]
Apr, 21

Computing Performance Benchmarks among CPU, GPU, and FPGA

In recent years, the world of high performance computing has been developing rapidly. The goal of this project was to conduct computing performance benchmarks on three major computing platforms, CPUs, GPUs, and FPGAs. A total of 66 benchmarks were evaluated. GPUs outperformed the other platforms in terms of execution time. CPUs outperformed in overall execution […]
Apr, 21

Fast Universal Background Model (UBM) Training on GPUs using Compute Unified Device Architecture (CUDA)

Universal Background Modeling (UBM) is an alternative hypothesized modeling that is used extensively in Speaker Verification (SV) systems. Training the background models from large speech data requires a significant amount of memory and computational load. In this paper a parallel implementation of speaker verification system based on Gaussian Mixture Modeling – Universal Background Modeling (GMM […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: