2274

Posts

Dec, 18

Shader Performance Analysis on a Modern GPU Architecture

This paper presents an analysis of the performance of the shader processing units in a modern graphics processor unit (GPU) architecture using real graphic applications. The architecture of a modern GPU is described and a simulator and associated framework used to evaluate the architecture is introduced. The paper analyses the effects in performance of different […]
Dec, 18

GPU clusters for high-performance computing

Large-scale GPU clusters are gaining popularity in the scientific computing community. However, their deployment and production use are associated with a number of new challenges. In this paper, we present our efforts to address some of the challenges with building and running GPU clusters in HPC environments. We touch upon such issues as balanced cluster […]
Dec, 18

Accelerating Template-Based Matching on the GPU for AR Applications

Recently researchers have shown that it is possible to use GPU hardware for image processing and computer vision algorithms. We have been exploring how to use GPU hardware to improve marker-based tracking for AR Applications. In this paper we describe our findings and explored issues in the context of a standard fiducial tracking pipeline. We […]
Dec, 18

Accelerating SQL Database Operations on a GPU with CUDA

Prior work has shown dramatic acceleration for various database operations on GPUs, but only using primitives that are not part of conventional database languages such as SQL. This paper implements a subset of the SQLite command processor directly on the GPU. This dramatically reduces the effort required to achieve GPU acceleration by avoiding the need […]
Dec, 18

Efficient, High-Quality Bayer Demosaic Filtering on GPUs

This paper describes a series of optimizations for implementing the high-quality Malvar-He-Cutler Bayer demosaicing filter on a GPU in OpenGL. Applying this filter is the first step in most video-processing pipelines but is generally considered too slow for real time on a CPU. The optimized implementation contains 66% fewer ALU operations than a direct GPU […]
Dec, 18

GPU-based Island Model for Evolutionary Algorithms

The island model for evolutionary algorithms allows to delay the global convergence of the evolution process and encourage diversity. However, solving large size and time-intensive combinatorial optimization problems with the island model requires a large amount of computational resources. GPU computing is recently revealed as a powerful way to harness these resources. In this paper, […]
Dec, 18

Accelerating K-Means on the Graphics Processor via CUDA

In this paper an optimized k-means implementation on the graphics processing unit (GPU) is presented. NVIDIApsilas compute unified device architecture (CUDA), available from the G80 GPU family onwards, is used as the programming environment. Emphasis is placed on optimizations directly targeted at this architecture to best exploit the computational capabilities available. Additionally drawbacks and limitations […]
Dec, 18

A GPU based implementation of Center-Surround Distribution Distance for feature extraction and matching

The release of general purpose GPU programming environments has garnered universal access to computing performance that was once only available to super-computers. The availability of such computational power has fostered the creation and re-deployment of algorithms, new and old, creating entirely new classes of applications. In this paper, a GPU implementation of the Center-Surround Distribution […]
Dec, 18

A Single (Unified) Shader GPU Microarchitecture for Embedded Systems

We present and evaluate the TILA-rin GPU microarchitecture for embedded systems using the ATTILA GPU simulation framework. We use a trace from an execution of the Unreal Tournament 2004 PC game to eval uate and compare the performance of the proposed embedded GPU against a baseline GPU architecture for the PC. We evaluate the different […]
Dec, 18

A Cross-Input Adaptive Framework for GPU Programs Optimization

Recent years have seen a trend in using graphic processing units (GPU) as accelerators for general-purpose computing. The inexpensive, single-chip, massively parallel architecture of GPU has evidentially brought factors of speedup to many numerical applications. However, the development of a high-quality GPU application is challenging, due to the large optimization space and complex unpredictable effects […]
Dec, 17

Fast Software AES Encryption

This paper presents new software speed records for AES-128 encryption for architectures at both ends of the performance spectrum. On the one side we target the low-end 8-bit AVR microcontrollers and 32-bit ARM microprocessors, while on the other side of the spectrum we consider the high-performing Cell broadband engine and NVIDIA graphics processing units (GPUs). […]
Dec, 17

A New Parallel Method of Smith-Waterman Algorithm on a Heterogeneous Platform

Smith-Waterman algorithm is a classic dynamic programming algorithm to solve the problem of biological sequence alignment. However, with the rapid increment of the number of DNA and protein sequences, the originally sequential algorithm is very time consuming due to there existing the same computing task computed repeatedly on large-scale data. Today’s GPU (graphics processor unit) […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org