7365

Posts

Mar, 11

NUMA Data-Access Bandwidth Characterization and Modeling

Clusters of seemingly homogeneous compute nodes are increasingly heterogeneous within each node due to replication and distribution of node-level subsystems. This intra-node heterogeneity can adversely affect program execution performance by inflicting additional data-access performance penalties when accessing non-local data. In many modern NUMA architectures, both memory and I/O controllers are distributed within a node and […]
Mar, 11

An Algorithm for Fast Edit Distance Computation on GPUs

The problem of finding the edit distance between two sequences (and its closely related problem of longest common subsequence) are important problems with applications in many domains like virus scanners, security kernels, natural language translation and genome sequence alignment. The traditional dynamic-programming based algorithm is hard to parallelize on SIMD processors as the algorithm is […]
Mar, 11

GPU Path Tracing

The goal of this work is to verify the possibility to utilize GPU for global illumination computations in a commercial software environment and explore an efficient way to do it. Path tracing with BVH as the acceleration data structure was implemented on GPU using CUDA successfully. It was arranged as a pipelined structure which supported […]
Mar, 10

Performance Analysis of a Novel GPU Computation-to-core Mapping Scheme for Robust Facet Image Modeling

Though the GPGPU concept is well-known in image processing, much more work remains to be done to fully exploit GPUs as an alternative computation engine. This paper investigates the computation-to-core mapping strategies to probe the efficiency and scalability of the robust facet image modeling algorithm on GPUs. Our fine-grained computation-to-core mapping scheme shows a significant […]
Mar, 10

Acceleration of Solving Maxwell’s Equations Using Cluster of GPUs

Finite difference time domain (FDTD) is a numerical method for solving differential equations like Maxwell’s equations. Normally, simulation time of these equations is very long and there has been a great effort to reduce it. The most recent and useful way to reduce the simulation time of these equations is through using GPUs. Graphical processing […]
Mar, 10

CUDA Accelerated Face Recognition Using Local Binary Patterns

In this paper, we present a GPU accelerated face recognition framework using CUDA. We use weighted regional LBP histograms as features and k-nearest neighbour (k-NN) algorithm for classification. Our first contribution is to present an efficient way to compute LBP values from an input image and construct weighted regional LBP histograms in GPU using a […]
Mar, 10

GPU-Accelerated Large-Eddy Simulation of Turbulent Channel Flows

High performance computing clusters that are augmented with cost and power efficient graphics processing unit (GPU) provide new opportunities to broaden the use of large-eddy simulation technique to study high Reynolds number turbulent flows in fluids engineering applications. In this paper, we extend our earlier work on multi-GPU acceleration of an incompressible Navier-Stokes solver to […]
Mar, 10

Multi-Object Geodesic Active Contours (MOGAC): A Parallel Sparse-Field Algorithm for Image Segmentation

An important task for computer vision systems is to segment adjacent structures in images without producing gaps or overlaps. Multi-object Level Set Methods (MLSM) perform this task with the benefit of sub-pixel accuracy. However, current implementations of MLSM are not as computationally or memory efficient as their region growing and graph cut counterparts which lack […]
Mar, 9

Asynchronous Parallel Computing Model of Global Motion Estimation with CUDA

For video coding, weighing the balance between and coding rate image quality, we apply global motion search algorithm to avoid loss of image quality and parallel computing capacity of graphics processors to accelerate the encoding process. According to the heterogeneous system of CPU+GPU, and the multi-threaded parallel structure, thread synchronization features of CUDA platform, we […]
Mar, 9

A Study of CUDA Acceleration and Impact of Data Transfer Overhead in Heterogeneous Environment

Along with the introduction of many-core GPUs, there is widespread interest in using GPUs to accelerate non-graphics applications such as energy, bioinformatics, finance and several research areas. With a wide range of data sizes where the CPU has greater performance, it would be important that CUDA enabled programs properly select when to and not to […]
Mar, 9

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters

Dynamic scheduling and varying decomposition granularity are well-known techniques for achieving high performance in parallel computing. Heterogeneous clusters with highly data-parallel processors, such as GPUs, present unique problems for the application of these techniques. These systems reveal a dichotomy between grain sizes: decompositions ideal for the CPUs may yield insufficient data-parallelism for accelerators, and decompositions […]
Mar, 9

Relational Algorithms for Multi-Bulk-Synchronous Processors

Relational databases remain an important application domain for organizing and analyzing the massive volume of data generated as sensor technology, retail and inventory transactions, social media, computer vision, and new fields continue to evolve. At the same time, processor architectures are beginning to shift towards hierarchical and parallel architectures employing throughput-optimized memory systems, lightweight multi-threading, […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: