6535

Posts

Nov, 7

Fast TV-L1 Optical Flow for Interactivity

Vision is a natural tool for human-computer interaction, since it pro- vides visual feedback to the user and mimics some human behaviors. It requires however the fast and robust computation of motion primi- tives, which remains a difficult problem. In this work, we propose to apply some recent mathematical results about convex optimization to the […]
Nov, 3

Efficient Quicksort and 2D Convex Hull for CUDA, and MSIMD as a Realistic Model of Massively Parallel Computations

In recent years CUDA has become a major architecture for multithreaded computations. Unfortunately, its potential is not yet being commonly utilized because many fundamental problems have no practical solutions for such machines. Our goal is to establish a hybrid multicore/parallel theoretical model that represents well architectures like NVIDIA CUDA, Intel Larabee, and OpenCL as well […]
Nov, 2

A Comparison of Many-threaded Differential Evolution and Genetic Algorithms on CUDA

The recent time has seen the rise of consumer grade massively parallel environments. Powerful GPUs and multi-core processors became widely available and easy to use programming APIs such as nVidia CUDA, OpenCL, and DirectCompute simplify the development of applications that can utilize them. In this environment, the nature inspired metaheuristics can be in suitable cases […]
Oct, 31

SIMD Re-Convergence At Thread Frontiers

Hardware and compiler techniques for mapping data-parallel programs with divergent control flow to SIMD architectures have recently enabled the emergence of new GPGPU programming models such as CUDA, OpenCL, and DirectX Compute. The impact of branch divergence can be quite different depending upon whether the program’s control flow is structured or unstructured. In this paper, […]
Oct, 30

Exploring Many-Core Design Templates for FPGAs and ASICs

We present a highly productive approach to hardware design based on a many-coremicroarchitectural template used to implement compute-bound applications expressed in a high-level data-parallel language such as OpenCL. The template is customized on a per-application basis via a range of high-level parameters such as the interconnect topology or processing element architecture. The key benefits of […]
Oct, 30

GPU Computations in Heterogeneous Grid Environments

This thesis describes how the performance of job management systems on heterogeneous computing grids can be increased with Graphics Processing Units (GPU). The focus lies on describing what is required to extend the grid to support the Open Computing Language (OpenCL) and how an OpenCL application can be implemented for the heterogeneous grid. Additionally, already […]
Oct, 30

Dynamic Scheduling of Parallel Code for Heterogeneous Systems

A typical consumer desktop computer has a multi-core CPU with at least two and possibly up to eight processing elements over four processors, and a multi-core GPU with up to 512 processing elements. Both the CPU and the GPU are capable of running parallel code, and this project demonstrates a method for dynamically deciding whether […]
Oct, 30

Development and evaluation of a GPU-optimized N-body term for the simulation of biomolecules

Advancements in massively parallel sampling of the conformational space of biomolecules enables, for example, protein structure prediction, in-silico drug development and cell signaling. Despite the existence of highly distributed protein simulation architectures like POEM@HOME, there was no abundant computational resource both strong and serial strength and in parallel sampling. In this study we investigate the […]
Oct, 30

CPU and GPU Co-processing for Sound

When using voice communications, one of the problematic phenomena that can occur, is participants hearing an echo of their own voice. Acoustic echo cancellation (AEC) is used to remove this echo, but can be computationally demanding.The recent OpenCL standard allows high-level programs to be run on both multi-core CPUs, as well as Graphics Processing Units […]
Oct, 28

Parallel Computing the Longest Common Subsequence (LCS) on GPUs: Efficiency and Language Suitability

Sequence alignment is one of the most used tools in bioinformatic to find the resemblance among many sequences like ADN, ARN, amino acids. The longest common subsequence (LCS) of biological sequences is an essential and effective technique in sequence alignment. For solving the LCS problem, we resort to dynamic programming approach. Due to the growth […]
Oct, 22

Hardware Transactional Memory for GPU Architectures

Graphics processor units (GPUs) are designed to efficiently exploit thread level parallelism (TLP), multiplexing execution of 1000s of concurrent threads on a relatively smaller set of single-instruction, multiple-thread (SIMT) cores to hide various long latency operations. While threads within a CUDA block/OpenCL workgroup can communicate efficiently through an intra-core scratchpad memory, threads in different blocks […]
Oct, 20

FAMOUS, faster: using parallel computing techniques to accelerate the FAMOUS/HadCM3 climate model with a focus on the radiative transfer algorithm

We have optimised the atmospheric radiation algorithm of the FAMOUS climate model on several hardware platforms. The optimisation involved translating the Fortran code to C and restructuring the algorithm around the computation of a single air column. A task queue and a thread pool are used to distribute the computation to several processors. Finally, four […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: