Posts
May, 22
Optimized GPU Framework for Ultrasound Color Flow Imaging
A GPU framework for ultrasound color flow imaging (CFI) based on auto-correlation is presented. The parallel CFI processing framework implementation is mainly based on CUDA performance features, such as the memory selection strategy, applicable thread structure and high-throughput bandwidth. Parallel convolution algorithm and multi-channel championship algorithm are proposed. This CFI method achieves a frame rate […]
May, 21
A trigger system based on Graphics Processing Unit (GPU)
We discuss the possible use of GPUs (Graphics Processing Unit) in the all-digital trigger and data acquisition (TDAQ) chain of the NA62 experiment at CERN. The exponentially growing interest in using GPUs for general purpose applications is based on the impressive performances achieved (peak performance already exceeding the Teraflop/s), on the high bandwidth to memory […]
May, 21
An architecture design of GPU-accelerated VoD streaming servers with network coding
Graphics processing unit (GPU) has evolved into a general-purpose computing platform. Inspired by the GPU technology advantage, this paper concerns the design and performance evaluation of practical GPU-accelerated server architecture for Video-on-Demand (VoD) services with network coding. Following the proposal of an optimized network coding algorithm based on parallel threads on GPU, a GPU-Accelerated Server […]
May, 21
A comprehensive analysis and parallelization of an image retrieval algorithm
The prevalence of the Internet and cloud computing has made multimedia data, such as image data and video data, become major data types in our daily life. For example, many data-intensive applications, such as health care and video recommendation, involve collecting, indexing and retrieving tera-scale multimedia data every day. With such a huge amount of […]
May, 21
Analyzing throughput of GPGPUs exploiting within-die core-to-core frequency variation
The state-of-the-art general-purpose graphic processing units (GPGPUs) can offer very high computational throughput for general-purpose, highly-parallel applications using hundreds of available on-chip cores. Meanwhile, as technology is scaled down below 65nm, each core’s maximum frequency varies significantly due to increasing within-die variations. This, in turn, diminishes the throughput improvement of GPGPUs through technology scaling because […]
May, 21
Image processing applications on a low power highly parallel SIMD architecture
In this paper, we present and discuss high performance implementation of a wide class of image processing applications on a low-power massively parallel SIMD architecture, the ClearSpeed CSX700. We present parallel implementation results for four classes of image processing applications: feature detection (Harris Corner Detector), stereo vision (a class of SSD like algorithms), model estimation […]
May, 21
Multilevel Granularity Parallelism Synthesis on FPGAs
Recent progress in High-Level Synthesis (HLS) techniques has helped raise the abstraction level of FPGA programming. However implementation and performance evaluation of the HLS-generated RTL, involves lengthy logic synthesis and physical design flows. Moreover, mapping of different levels of coarse grained parallelism onto hardware spatial parallelism affects the final FPGA-based performance both in terms of […]
May, 21
Synthesis of Platform Architectures from OpenCL Programs
The problem of automatically generating hardware modules from a high level representation of an application has been at the research forefront in the last few years. In this paper, we use OpenCL, an industry supported standard for writing programs that execute on multicore platforms and accelerators such as GPUs. Our architectural synthesis tool, SOpenCL (Silicon-OpenCL), […]
May, 21
A Parallel Algorithm for Flight Route Planning on GPU Using CUDA (thesis)
Aerial surveillance missions require a geographical region known as the area of interest to be inspected. The route that the aerial reconnaissance vehicle will follow is known as the flight route. Flight route planning operation has to be done before the actual mission is executed. A flight route may consist of hundreds of predefined geographical […]
May, 21
A Parallel Algorithm for UAV Flight Route Planning on GPU
Aerial surveillance missions require a geographical region known as the area of interest to be inspected. The route that the aerial reconnaissance vehicle will follow is known as the flight route. Flight route planning operation has to be done before the actual mission is executed. A flight route may consist of hundreds of pre-defined geographical […]
May, 21
Acceleration of the GAMESS-UK electronic structure package on graphical processing units
The approach used to calculate the two-electron integral by many electronic structure packages including generalized atomic and molecular electronic structure system-UK has been designed for CPU-based compute units. We redesigned the two-electron compute algorithm for acceleration on a graphical processing unit (GPU). We report the acceleration strategy and illustrate it on the (ss|ss) type integrals. […]
May, 20
Real-time Adaptive Tone Mapping for Monitoring High Contrast Hemispherical Image Capture with the GPU
The exposure of high dynamic range scenes needs special attention to capture all details. A new method for a low dynamic monitoring preview of the local scene details is presented. We follow the idea of edge preserving nonlinear bilateral filtering instead of classic methods such as linear high pass filtering or histogram equalization. Real-time performance […]