Posts
Aug, 26
Acceleration of an improved Retinex algorithm
Retinex is an image restoration method and the center/surround Retinex is appropriate for parallelization because it utilizes a convolution operation with large kernel size to achieve dynamic range compression and color/lightness rendition. However, its great capability for image enhancement comes with intensive computation. This paper presents a GPURetinex, which is a data parallel algorithm based […]
Aug, 26
Accelerating tetrahedral interpolation with data-level and Thread-Level Parallel optimization
The tetrahedral interpolation method for color space conversion consumes the longest time in the entire color management process. This makes it difficult to implement a purely software-based high-end image processing system. In this study, SIMD (Single Instruction Multiple Data) and GPGPU (General Purpose Graphics Processing Unit) based optimizations for tetrahedral interpolation are implemented. To exploit […]
Aug, 26
Multi-level parallelism, global arrays, GPGPU Programming: Unify programming paradigms on Grid computing with efficiency
As technology advances, computing resources also gain benefits in many aspects: larger capacity, increased capability as well as rapidity. However, with heterogeneously distributed resources in Grid computing environment, the development an application to fully utilize the resources is a challenge. Especially, the computing resources themselves regularly upgrade their computing power for example by recruiting General […]
Aug, 25
TH-1: China’s first petaflop supercomputer
In recent years, heterogeneous systems and cooperative computing have become popular research directions in the field of high performance computing. With fast scaling of the size of high performance computer systems, problems such as power consumption and reliability come to the forefront. The aim of high performance computing has thus shifted from merely seeking peak […]
Aug, 25
Hera-JVM: a runtime system for heterogeneous multi-core architectures
Heterogeneous multi-core processors, such as the IBM Cell processor, can deliver high performance. However, these processors are notoriously difficult to program: different cores support different instruction set architectures, and the processor as a whole does not provide coherence between the different cores’ local memories. We present Hera-JVM, an implementation of the Java Virtual Machine which […]
Aug, 25
Parallelizing compiler framework and API for power reduction and software productivity of real-time heterogeneous multicores
Heterogeneous multicores have been attracting much attention to attain high performance keeping power consumption low in wide spread of areas. However, heterogeneous multicores force programmers very difficult programming. The long application program development period lowers product competitiveness. In order to overcome such a situation, this paper proposes a compilation framework which bridges a gap between […]
Aug, 25
MobiRT: an implementation of OpenGL ES-based CPU-GPU hybrid ray tracer for mobile devices
Three-dimensional user interfaces on mobile devices are increasingly important. For more realistic three-dimensional visualization on mobile devices, we present the implementation of an OpenGL ES-based CPU-GPU hybrid ray tracer. This ray tracer exploits the availability of CPU and GPU architectures to fully support reflection, refraction, hard shadows, and dynamic scenes. To the best of our […]
Aug, 25
Considering GPGPU for HPC Centers: Is It Worth the Effort?
In contrast to just a few years ago, the answer to the question "What system should we buy next to best assist our users" has become a lot more complicated for the operators of an HPC center today. In addition to multicore architectures, powerful accelerator systems have emerged, and the future looks heterogeneous. In this […]
Aug, 25
Towards a GPU-Based Simulation Framework for Deformable Surface Meshes
Realism and real-time visual and haptic interactions with anatomical structures are key challenges in simulation software for surgeries. Overcoming these challenges is made difficult by the need to run the software on consumer-grade computing platforms. This paper presents preliminary work towards a framework for fast, realistic and stable simulation of deformable anatomical structures. The approach […]
Aug, 25
A PC-based fully-programmable medical ultrasound imaging system using a graphics processing unit
In this paper, a PC-based fully-programmable medical ultrasound imaging system is presented where a high performance graphics processing unit (GPU) is utilized to perform entire ultrasound processing. In the proposed architecture, ultrasound signal and image processing algorithms were divided into four modules and efficiently implemented on the NVIDA’s Computer Unified Device Architecture (CUDA) platform (GeForce […]
Aug, 25
Fine-grain Parallelism using Multi-core, Cell/BE, and GPU Systems
Currently, we are facing a situation where applications exhibit increasing computational demands and where a large variety of parallel processor systems are available. In this paper we focus on exploiting fine-grain parallelism for three applications with distinct characteristics: a Bioinformatics application (MrBayes), a Molecular Dynamics application (NAMD), and a Database application (TPC-H). We assess, side-by-side, […]
Aug, 25
An efficient GPU-based time domain solver for the acoustic wave equation
An efficient algorithm for time-domain solution of the acoustic wave equation for the purpose of room acoustics is presented. It is based on adaptive rectangular decomposition of the scene and uses analytical solutions within the partitions that rely on spatially invariant speed of sound. This technique is suitable for auralizations and sound field visualizations, even […]