7378

Posts

Mar, 13

Real-time execution of image change detection

State-of-the-art video analysis systems feature multiple complex processing steps and operate on high resolution images. Intensive computation power is needed for real-time execution. In this project an image change detection application is mapped to a heterogeneous multicore CPU/GPU platform. It is investigated what hardware configuration is required to execute the application in real-time. For optimal […]
Mar, 12

Dynamic Compilation of Data-Parallel Kernels for Vector Processors

Modern processors enjoy augmented throughput and power efficiency through specialized functional units leveraged via instruction set extensions. These functional units accelerate performance for specific types of operations but must be programmed explicitly. Moreover, applications targeting these specialized units will not take advantage of future ISA extensions and tend not to be portable across multiple ISAs. […]
Mar, 12

GPU Accelerated Computation of Fast Spectral Transforms

This paper discusses techniques for accelerated computation of several fast spectral transforms on graphics processing units (GPUs) using the Open Computing Language (OpenCL). We present a reformulation of fast algorithms which takes into account peculiar properties of transforms to make them suitable for the GPU implementation. A special attention is paid to the organization of […]
Mar, 12

A GPU Algorithm for Greedy Graph Matching

Greedy graph matching provides us with a fast way to coarsen a graph during graph partitioning. Direct algorithms on the CPU which perform such greedy matchings are simple and fast, but offer few handholds for parallelisation. To remedy this, we introduce a fine-grained shared-memory parallel algorithm for maximal greedy matching, together with an implementation on […]
Mar, 12

Hybrid general-purpose computation on GPU (GPGPU) and computer graphics synthetic aperture radar simulation for complex scenes

In this paper, a new hybrid general-purpose computation on GPU (GPGPU) and computer graphics synthetic aperture radar (SAR) simulation method for complex scenes is proposed. Previous SAR simulations for complex scenes only use GPU’s graphics capabilities for scattering calculation in graphical electromagnetic computing (GRECO) algorithm. The new hybrid method use GPU’s graphics and parallel computing […]
Mar, 12

A Study of Real-Time Lighting Effects

Realistic lighting is an incredibly complex problem. All surfaces scatter light to all other surfaces. Realistic lighting in volumes of fog or smoke is even more complex because each particle absorbs and scatters light. This problem has been approximated with many techniques but can take hours to produce a single image. Creating these images in […]
Mar, 11

GPU Accelerated Real-Time Object Detection on High Resolution Videos Using Modified Census Transform

This paper presents a novel GPU accelerated object detection system using CUDA. Because of its detection accuracy, speed and robustness to illumination variations, a boosting based approach with Modified Census Transform features is used. Results are given on the face detection problem for evaluation. Results show that even our single-GPU implementation can run in real-time […]
Mar, 11

Better speedups using simpler parallel programming for graph connectivity and biconnectivity

Speedups demonstrated for finding the biconnected components of a graph: 9x to 33x on the Explicit Multi-Threading (XMT) many-core computing platform relative to the best serial algorithm using a relatively modest silicon budget. Further evidence suggests that speedups of 21x to 48x are possible. For graph connectivity, we demonstrate that XMT outperforms two recent NVIDIA […]
Mar, 11

NUMA Data-Access Bandwidth Characterization and Modeling

Clusters of seemingly homogeneous compute nodes are increasingly heterogeneous within each node due to replication and distribution of node-level subsystems. This intra-node heterogeneity can adversely affect program execution performance by inflicting additional data-access performance penalties when accessing non-local data. In many modern NUMA architectures, both memory and I/O controllers are distributed within a node and […]
Mar, 11

An Algorithm for Fast Edit Distance Computation on GPUs

The problem of finding the edit distance between two sequences (and its closely related problem of longest common subsequence) are important problems with applications in many domains like virus scanners, security kernels, natural language translation and genome sequence alignment. The traditional dynamic-programming based algorithm is hard to parallelize on SIMD processors as the algorithm is […]
Mar, 11

GPU Path Tracing

The goal of this work is to verify the possibility to utilize GPU for global illumination computations in a commercial software environment and explore an efficient way to do it. Path tracing with BVH as the acceleration data structure was implemented on GPU using CUDA successfully. It was arranged as a pipelined structure which supported […]
Mar, 10

Performance Analysis of a Novel GPU Computation-to-core Mapping Scheme for Robust Facet Image Modeling

Though the GPGPU concept is well-known in image processing, much more work remains to be done to fully exploit GPUs as an alternative computation engine. This paper investigates the computation-to-core mapping strategies to probe the efficiency and scalability of the robust facet image modeling algorithm on GPUs. Our fine-grained computation-to-core mapping scheme shows a significant […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org