7570

Posts

Apr, 23

Tree Structured Analysis on GPU Power Study

Graphics Processing Units (GPUs) have emerged as a promising platform for parallel computation. With a large number of processor cores and abundant memory bandwidth, GPUs deliver substantial computation power. While providing high computation performance, a GPU consumes high power and needs sufficient power supplies and cooling systems. It is essential to institute an efficient mechanism […]
Apr, 23

High-Efficient Parallel CAVLC Encoders on Heterogeneous Multicore Architectures

This article presents two high-efficient parallel realizations of the context-based adaptive variable length coding (CAVLC) based on heterogeneous multicore processors. By optimizing the architecture of the CAVLC encoder, three kinds of dependences are eliminated or weaken, including the context-based data dependence, the memory accessing dependence and the control dependence. The CAVLC pipeline is divided into […]
Apr, 23

Parallel Surface Reconstruction for Particle-Based Fluids

This paper presents a novel method that improves the efficiency of high-quality surface reconstructions for particle-based fluids using Marching Cubes. By constructing the scalar field only in a narrow band around the surface, the computational complexity and the memory consumption scale with the fluid surface instead of the volume. Furthermore, a parallel implementation of the […]
Apr, 23

Memory Bandwidth Efficient Two-Dimensional Fast Fourier Transform Algorithm and Implementation for Large Problem Sizes

Prevailing VLSI trends point to a growing gap between the scaling of on-chip processing throughput and off-chip memory bandwidth. An efficient use of memory bandwidth must become a first-class design consideration in order to fully utilize the processing capability of highly concurrent processing platforms like FPGAs. In this paper, we present key aspects of this […]
Apr, 21

Computing Performance Benchmarks among CPU, GPU, and FPGA

In recent years, the world of high performance computing has been developing rapidly. The goal of this project was to conduct computing performance benchmarks on three major computing platforms, CPUs, GPUs, and FPGAs. A total of 66 benchmarks were evaluated. GPUs outperformed the other platforms in terms of execution time. CPUs outperformed in overall execution […]
Apr, 21

Fast Universal Background Model (UBM) Training on GPUs using Compute Unified Device Architecture (CUDA)

Universal Background Modeling (UBM) is an alternative hypothesized modeling that is used extensively in Speaker Verification (SV) systems. Training the background models from large speech data requires a significant amount of memory and computational load. In this paper a parallel implementation of speaker verification system based on Gaussian Mixture Modeling – Universal Background Modeling (GMM […]
Apr, 21

An On-Demand Fast Parallel Pseudo Random Number Generator with Applications

The use of manycore architectures and accelerators, such as GPUs, with good programmability has allowed them to be deployed for vital computational work. The ability to use randomness in computation is known to help in several situations. For such computations to be made possible on a general purpose computer, a source of randomness, or in […]
Apr, 21

Image Convolution Processing: a GPU versus FPGA Comparison

Convolution is one of the most important operators used in image processing. With the constant need to increase the performance in high-end applications and the rise and popularity of parallel architectures, such as GPUs and the ones implemented in FPGAs, comes the necessity to compare these architectures in order to determine which of them performs […]
Apr, 21

Fast Collision Culling in Large-Scale Environments Using GPU Mapping Function

This paper presents a novel and efficient GPU-based parallel algorithm to cull non-colliding object pairs in very large-scale dynamic simulations. It allows to cull objects in less than 25ms with more than 100K objects. It is designed for many-core GPU and fully exploits multi-threaded capabilities and data-parallelism. In order to take advantage of the high […]
Apr, 21

An Automatic OpenCL Compute Kernel Generator for Basic Linear Algebra Operations

An automatic OpenCL compute kernel generator framework for linear algebra operations is presented. It allows for specifying matrix and vector operations in high-level C++ code, while the low-level details of OpenCL compute kernel generation and handling are dealt with in the background. Our approach releases users from considerable additional effort required for learning the details […]
Apr, 21

Priority-Based Task Management in a GPGPU Megakernel

In this paper, we examine the challenges of implementing priority-based task management with respect to userdefined preferential attributes on a graphics processing unit (GPU). Previous approaches usually rely on constant synchronization with the CPU to determine the proper chronological sequence for execution. We transfer the responsibility for evaluating and arranging planned tasks to the GPU […]
Apr, 21

Multicore Processing for Classification and Clustering Algorithms

Data Mining algorithms such as classification and clustering are the future of computation, though multidimensional data-processing is required. People are using multicore processors with GPU’s. Most of the programming languages doesn’t provide multiprocessing facilities and hence wastage of processing resources. Clustering and classification algorithms are more resource consuming. In this paper we have shown strategies […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: