10417

Posts

Aug, 24

Analysis-Driven Design of Parallel Floating-Point Matrix Multiplication for Implementation in Reconfigurable Logic

The objective of this research is to design an efficient and flexible implementation of parallel matrix multiplication for FPGA devices by analyzing the computation and studying its design space. In order to adapt to the FPGA platform, the design employs blocking and parallelization. Blocked matrix multiplication enables processing arbitrarily large matrices using limited memory capacity, […]
Aug, 24

Reverse Computation for Rollback-based Fault Tolerance in Large Parallel Systems: Evaluating the Potential Gains and Systems Effects

Reverse computation is presented here as an important future direction in addressing the challenge of fault tolerant execution on very large cluster platforms for parallel computing. As the scale of parallel jobs increases, traditional checkpointing approaches suffer scalability problems ranging from computational slowdowns to high congestion at the persistent stores for checkpoints. Reverse computation can […]
Aug, 23

A memory access model for highly-threaded many-core architectures

A number of highly-threaded, many-core architectures hide memory-access latency by low-overhead context switching among a large number of threads. The speedup of a program on these machines depends on how well the latency is hidden. If the number of threads were infinite, theoretically, these machines could provide the performance predicted by the PRAM analysis of […]
Aug, 23

A Unified Framework for Multi-Sensor HDR Video Reconstruction

One of the most successful approaches to modern high quality HDR-video capture is to use camera setups with multiple sensors imaging the scene through a common optical system. However, such systems pose several challenges for HDR reconstruction algorithms. Previous reconstruction techniques have considered debayering, denoising, resampling (alignment) and exposure fusion as separate problems. In contrast, […]
Aug, 23

Implementation of Kirchhoff prestack depth migration on GPU

The massively parallel nature of Graphics Processing Units has made them an attractive platform for some computationally intensive algorithms. This article presents a method to run 3D Kirchhoff prestack depth migration on GPU-based clusters. Compared to a CPU only version of the same algorithm, the new approach delivers a significantly greater efficiency. An actual production […]
Aug, 23

Tight Binding Molecular Dynamics on CPU and GPU clusters

The aim of this dCSE project was to improve the TBE code which is based on the tight binding model with self consistent multipole charge transfer. Given an appropriate parameterisation, the code is general and can be used to simulate a wide variety of systems and phenomena such as bond breaking, charge and magnetic polarisation. […]
Aug, 23

Estimation of Skin Optical Parameters for Real-Time Hyperspectral Imaging Applications using GPGPU Parallel Computing

Hyperspectral imaging with a high spatial and spectral resolution can be used to analyze materials using spectroscopic methods. This can be applied on skin as a general purpose real-time diagnostic tool. Light transport models, like the diffusion model, can describe the light propagation in tissue before the light is captured by the hyperspectral camera. The […]
Aug, 23

Influence of InfiniBand FDR on the Performance of Remote GPU Virtualization

The use of GPUs to accelerate general-purpose scientific and engineering applications is mainstream today, but their adoption in current high-performance computing clusters is impaired primarily by acquisition costs and power consumption. Therefore, the benefits of sharing a reduced number of GPUs among all the nodes of a cluster can be remarkable for many applications. This […]
Aug, 23

A Shader Library for OpenGL 4 and GLSL 4.3 Learning and Development

In the past decades, besides experiencing a huge development in terms of computation speed, we have also experienced the emergence of the Programmable GPU, giving birth to languages like GLSL and CUDA. This technology gives great flexibility for the usage of such a powerful hardware, attracting the interest of many researchers and programmers to this […]
Aug, 23

Synthesis of Custom Networks of Heterogeneous Processing Elements for Complex Physical System Emulation

Physical system models that consist of thousands of ordinary differential equations can be synthesized to field-programmable gate arrays (FPGAs) for highly-parallelized, real-time physical system emulation. Previous work introduced synthesis of custom networks of homogeneous processing elements, consisting of processing elements that are either all general differential equation solvers or are all custom solvers tailored to […]
Aug, 23

Median Based Parallel Steering Kernel Regression for Image Reconstruction

Image reconstruction is a process of obtaining the original image from corrupted data. Applications of image reconstruction include Computer Tomography, radar imaging, weather forecasting etc. Recently steering kernel regression method has been applied for image reconstruction [1]. There are two major drawbacks in this technique. Firstly, it is computationally intensive. Secondly, output of the algorithm […]
Aug, 23

Serial and Parallel Bayesian Spam Filtering using Aho-Corasick and PFAC

With the rapid growth of Internet, E-mail, with its convenient and efficient characteristics, has become an important means of communication in people’s life. It reduces the cost of communication. It comes with Spam. Spam emails, also known as "junk e-mails", are unsolicited one’s sent in bulk with hidden or forged identity of the sender, address, […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: