5748

Posts

Sep, 24

An MDE Approach for Automatic Code Generation from MARTE to OpenCL

Advanced engineering and scientific communities have used parallel programming to solve their large scale complex problems. Achieving high performance is the main advantage for this choice. However, as parallel programming requires a non-trivial distribution of tasks and data, developers find it hard to implement their applications effectively. Thus, in order to reduce design complexity, we […]
Sep, 23

Integrated Framework for Heterogeneous Embedded Platforms Using OpenCL

The technology community is rapidly moving away from the age of computers and laptops, and is entering the emerging era of hand-held devices. With the rapid development of smart phones, tablets, and pads, there has been widespread adoption of Graphic Processing Units (GPUs) in the embedded space. The hand-held market is now seeing an ever […]
Sep, 23

Embedding OpenCL in C++ for Expressive GPU Programming

We present a high performance GPU programming language, based on OpenCL, that is embedded in C++. Our embedding provides shared data structures, typesafe kernel invocation, and the ability to more naturally interleave CPU and GPU functions, similar to CUDA but with the portability of OpenCL. For expressivity, our language provides an abstraction that releases control […]
Sep, 23

Improving SIMT Efficiency of Global Rendering Algorithms with Architectural Support for Dynamic Micro-Kernels

Wide Single Instruction, Multiple Thread (SIMT)architectures often require a static allocation of thread groups that are executed in lockstep throughout the entire application kernel. Individual thread branching is supported by executing all control flow paths for threads in a thread group and only committing the results of threads on the current control path. While convergence […]
Sep, 23

Accelerating reaction-diffusion simulations with general-purpose graphics processing units

SUMMARY: We present a massively parallel stochastic simulation algorithm (SSA) for reaction-diffusion systems implemented on Graphics Processing Units (GPUs). These are designated chips optimized to process a high number of floating point operations in parallel, rendering them well-suited for a range of scientific high-performance computations. Newer GPU generations provide a high-level programming interface which turns […]
Sep, 23

Parallel processing on NVIDIA graphics processing units using CUDA

This paper is an introduction to general-purpose computing on graphics processing units. This involves taking advantage of the parallel processing power of modern graphics cards to do general purpose computation. The CUDA architecture used for general purpose computations on NVIDIA graphics cards is described, and important features affecting the run times of CUDA programs are […]
Sep, 23

Functional and dynamic programming in the design of parallel prefix networks

A parallel prefix network of width n takes n inputs, a_1, a_2, … , a_n, and computes each yi = a_1 o a_2 o … o a_i for 1 <= i <= n, for an associative operator o. This is one of the fundamental problems in computer science, because it gives insight into how parallel […]
Sep, 23

Image super-resolution by vectorizing edges

As the resolution of output device increases, the demand of high resolution contents has become more eagerly. Therefore, the image superresolution algorithms become more important. In digital image, the edges in the image are related to human perception heavily. Because of this, most recent research topics tend to enhance the image edges to achieve better […]
Sep, 23

Acceleration of Functional Validation Using GPGPU

Logic simulation of a VLSI chip is a computationally intensive process. There exists an urgent need to map functional validation algorithms onto parallel architectures to aid hardware designers in meeting time-to-market constraints. In this paper, we propose three novel methods for logic simulation of combinational circuits on GPGPUs. Initial experiments run on two methods using […]
Sep, 23

Simple optimizations for an applicative array language for graphics processors

Graphics processors (GPUs) are highly parallel devices that promise high performance, and they are now flexible enough to be used for general-purpose computing. A programming language based on implicitly data-parallel collective array operations can permit high-level, effective programming of GPUs. I describe three optimizations for such a language: automatic use of GPU shared memory cache, […]
Sep, 23

Mathematical limits of parallel computation for embedded systems

Embedded systems are designed to perform a specific set of tasks, and are frequently found in mobile, power-constrained environments. There is growing interest in the use of parallel computation as a means to increase performance while reducing power consumption. In this paper, we highlight fundamental limits to what can and cannot be improved by parallel […]
Sep, 23

HHT-based time-frequency analysis method for biomedical signal applications

Fourier transform, wavelet transformation, and Hilbert-Huang transformation (HHT) can be used to discuss the frequency characteristics of linear and stationary signals, the time-frequency features of linear and non-stationary signals, the time-frequency features of non-linear and non-stationary signals, respectively [1-6]. HHT is a combination of empirical mode decomposition (EMD) and Hilbert spectral analysis. EMD uses the […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: