Posts
Apr, 6
High-Performance Energy-Efficient Multicore Embedded Computing
With Moore’s law supplying billions of transistors on-chip, embedded systems are undergoing a transition from single-core to multicore to exploit this high-transistor density for high performance. Embedded systems differ from traditional high-performance supercomputers in that power is a first-order constraint for embedded systems; whereas, performance is the major benchmark for supercomputers. The increase in on-chip […]
Apr, 6
Fast Spoken Query Detection Using Lower-Bound Dynamic Time Warping on Graphical Processing Units
In this paper we present a fast unsupervised spoken term detection system based on lower-bound Dynamic Time Warping (DTW) search on Graphical Processing Units (GPUs). The lower-bound estimate and the K nearest neighbor DTW search are carefully designed to fit the GPU parallel computing architecture. In a spoken term detection task on the TIMIT corpus, […]
Apr, 6
Dynamic Scheduling for Large-Scale Distributed-Memory Ray Tracing
Ray tracing is an attractive technique for visualizing scientific data because it can produce high quality images that faithfully represent physically-based phenomena. Its embarrassingly parallel reputation makes it a natural candidate for visualizing large data sets on distributed memory clusters, especially for machines without specialized graphics hardware. Unfortunately, the traditional recursive ray tracing algorithm is […]
Apr, 6
A High-Performance Parallel FDTD Method Enhanced by Using SSE Instruction Set
We introduce a hardware acceleration technique for the parallel finite difference time domain (FDTD) method using the SSE (streaming (single instruction multiple data) SIMD extensions) instruction set. The implementation of SSE instruction set to parallel FDTD method has achieved the significant improvement on the simulation performance. The benchmarks of the SSE acceleration on both the […]
Apr, 6
GPGPU for orbital function evaluation with a new updating scheme
We accelerated an {it ab-initio} QMC electronic structure calculation by using GPGPU. The bottleneck of the calculation for extended solid systems is replaced by CUDA-GPGPU subroutine kernels which build up spline basis set expansions of electronic orbital functions at each Monte Carlo step. We achieved 30.8 times faster evaluation for the bottleneck, confirmed on the […]
Apr, 5
Visualization of Pareto Solutions by Spherical Self-Organizing Map and It’s acceleration on a GPU
In this study, we visualize Pareto-optimum solutions derived from multiple-objective optimization using spherical self-organizing maps (SOMs) that lay out SOM data in three dimensions. There have been a wide range of studies involving plane SOMs where Pareto-optimal solutions are mapped to a plane. However, plane SOMs have an issue that similar data differing in a […]
Apr, 5
A Programmable Processing Array Architecture Supporting Dynamic Task Scheduling and Module-Level Prefetching
Massively Parallel Processing Arrays (MPPA) constitute programmable hardware accelerators that excel in the execution of applications exhibiting Data-Level Parallelism (DLP). The concept of employing such programmable accelerators as sidekicks to the more traditional, general-purpose processing cores has very recently entered the mainstream; both Intel and AMD have introduced processor architectures integrating a Graphics Processing Unit […]
Apr, 5
Enhancing Performance of Simulations using GPGPU
General Purpose GPU computing, or GPGPU, is the use of a GPU (graphics processing unit) to do general purpose scientific and engineering computing. The model for GPU computing is to use a CPU and GPU together in a heterogeneous co-processing computing platform. The sequential part of the application runs on the CPU and the computationally-intensive […]
Apr, 5
Image Processing on Graphical Processing Units for faster DNA Sequencing
Next generation DNA sequencing technologies generate terabytes of image data in a typical run over several days. Compute power to process the increasing amount of image data is becoming a problem in next generation sequencing. We propose to use the compute power of Graphical Processing Units (GPUs) to address this problem. GPUs have an efficient […]
Apr, 4
A High-Performance Brownian Bridge for GPUs: Lessons for Bandwidth Bound Applications
We present a very exible Brownian bridge generator together with a GPU implementation which achieves close to peak performance on an NVIDIA C2050. The performance is compared with an OpenMP implementation run on several high performance x86-64 systems. The GPU shows a performance gain of at least 10x. Full comparative results are given in Section […]
Apr, 4
Depth Estimation using Open Compute Language (OpenCL)
3D Video and related technologies like view synthesis, 2D-3D video conversions rely heavily on depth/disparity maps extracted from stereo video content. Innovative Segment-based depth map extraction chain from stereo video content was proposed in [1] giving good trade-off between quality (exactness to the ground truth) and computational complexity. We accelerated this work further by ~150%, […]
Apr, 4
Fine-Grained Treatment to Synchronizations in GPU-to-CPU Translation
GPU-to-CPU translation may extend Graphics Processing Units (GPU) programs executions to multi-/many-core CPUs, and hence enable cross-device task migration and promote whole-system synergy. This paper describes some of our findings in treatment to GPU synchronizations during the translation process. We show that careful dependence analysis may allow a fine-grained treatment to synchronizations and reveal redundant […]