4853

Posts

Jul, 16

Uniform partitioning of Monte Carlo radiosity on GPUs

The radiosity method permits the obtaining of high quality images through the evaluation of the global illumination of the scene. The computational complexity and the memory requirements of the algorithm are the main problems when a large scene has to be processed. To reduce the memory requirements, Monte Carlo radiosity method is often used. In […]
Jul, 14

ForOpenCL: Transformations Exploiting Array Syntax in Fortran for Accelerator Programming

Emerging GPU architectures for high performance computing are well suited to a data-parallel programming model. This paper presents preliminary work examining a programming methodology that provides Fortran programmers with access to these emerging systems. We use array constructs in Fortran to show how this infrequently exploited, standardized language feature is easily transformed to lower-level accelerator […]
Jul, 14

Real-time, fast radio transient searches with GPU de-dispersion

The identification, and subsequent discovery, of fast radio transients through blind-search surveys requires a large amount of processing power, in worst cases scaling as $mathcal{O}(N^3)$. For this reason, survey data are generally processed offline, using high-performance computing architectures or hardware-based designs. In recent years, graphics processing units have been extensively used for numerical analysis and […]
Jul, 14

A Survey of Neural Computation on Graphics Processing Hardware

Modern graphics processing units (GPU) are used for much more than simply 3D graphics applications. From machine vision to finite element analysis, CPU’s are being used in diverse applications, collectively called general purpose graphics processor utilization. This paper explores the capabilities and limitations of modern GPU’s and surveys the neural computation technologies that have been […]
Jul, 14

Fat vs. Thin Threading Approach on GPUs: Application to Stochastic Simulation of Chemical Reactions

We explore two different threading approaches on a graphics processing unit (GPU) exploiting two different characteristics of the current GPU architecture. The fat thread approach tries to minimise data access time by relying on shared memory and registers potentially sacrificing parallelism. The thin thread approach maximises parallelism and tries to hide access latencies. We apply […]
Jul, 14

Design of a programmable micro-ultrasound research platform

To foster innovative uses of micro-ultrasound in biomedicine, it is beneficial to develop flexible research-purpose systems that allow researchers to easily reconfigure its system-level operations such as transmit firing sequence and receive processing. In this paper, we present the development of a programmable micro-ultrasound research platform that is capable of realizing various micro-imaging algorithms. The […]
Jul, 14

Fast parallel algorithm for audio content retrieval on GPUs

The search techniques audio content MIR (music information retrieval) face two major challenges: the robustness of the algorithm and the speed of this operation. In this article proposes a model of fast algorithm for the extraction of audio data by the fingerprinting technique, which is implemented on a CPU-based platform and then parallelized to run […]
Jul, 14

Fast and Efficient FPGA-Based Feature Detection Employing the SURF Algorithm

Feature detectors are schemes that locate and describe points or regions of ‘interest’ in an image. Today there are numerous machine vision applications needing efficient feature detectors that can work on Real-time; moreover, since this detection is one of the most time consuming tasks in several vision devices, the speed of the feature detection schemes […]
Jul, 14

Computing spike-based convolutions on GPUs

In spiking neural networks, asynchronous spike events are processed in parallel by neurons. Emulations of such networks are traditionally computed by CPUs or realized using dedicated neuromorphic hardware. In many neuromorphic systems, the address-event-representation (AER) is used for spike communication. In this paper we present the acceleration of AER based spike processing using a graphics […]
Jul, 14

A Sparse Matrix Personality for the Convey HC-1

In this paper we describe a double precision floating point sparse matrix-vector multiplier (SpMV) and its performance as implemented on a Convey HC-1 reconfigurable computer. The primary contributions of this work are a novel streaming reduction architecture for floating point accumulation, a novel on-chip cache optimized for streaming compressed sparse row (CSR) matrices, and end-to-end […]
Jul, 14

Fast circuit simulation on graphics processing units

SPICE based circuit simulation is a traditional workhorse in the VLSI design process. Given the pivotal role of SPICE in the IC design flow, there has been significant interest in accelerating SPICE. Since a large fraction (on average 75%) of the SPICE runtime is spent in evaluating transistor model equations, a significant speedup can be […]
Jul, 14

Efficient visual hull computation for real-time 3D reconstruction using CUDA

In this paper we present two efficient GPU-based visual hull computation algorithms. We compare them in terms of performance using image sets of varying size and different voxel resolutions. In addition, we present a real-time 3D reconstruction system which uses the proposed GPU-based reconstruction method to achieve real-time performance (30 fps) using 16 cameras and […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: