high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Accelerating tetrahedral interpolation with data-level and Thread-Level Parallel optimization

Accelerating tetrahedral interpolation with data-level and Thread-Level Parallel optimization

Jaewoo Ahn, Becksang Seong, Wonyong Sung

School of Electrical Engineering, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 151-742, Korea

10th International Symposium on Signals, Circuits and Systems (ISSCS), 2011

DOI:10.1109/ISSCS.2011.5978670

BibTeX

Source

1750

views

The tetrahedral interpolation method for color space conversion consumes the longest time in the entire color management process. This makes it difficult to implement a purely software-based high-end image processing system. In this study, SIMD (Single Instruction Multiple Data) and GPGPU (General Purpose Graphics Processing Unit) based optimizations for tetrahedral interpolation are implemented. To exploit DLP (Data-Level Parallelism) with SIMD extensions, the program is restructured and conditional branches are removed so that inter-pixel parallelism is used for tetrahedron determination, while inter-output-channel parallelism is employed for the table lookup and weighted sum. TLP (Thread-Level Parallelism) is exploited with GPGPU by allocating different input pixels to each thread. Memory access cycle is minimized using constant memory for color lookup table. We conclude that both DLP and TLP optimization is essential for recent multi-core CPUs with wider SIMD registers and reducing the communication overhead between the host and the device is critical for TLP optimization with GPGPUs.

Tags: Algorithms, Data parallelism, Image processing, Optimization

August 26, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Accelerating tetrahedral interpolation with data-level and Thread-Level Parallel optimization

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

Accelerating tetrahedral interpolation with data-level and Thread-Level Parallel optimization

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)