11868

Posts

Feb, 26

Real-Time GPU Implementation of Transverse Oscillation Vector Velocity Flow Imaging

Rapid estimation of blood velocity and visualization of complex flow patterns are important for clinical use of diagnostic ultrasound. This paper presents real-time processing for two-dimensional (2-D) vector flow imaging which utilizes an off-the-shelf graphics processing unit (GPU). In this work, Open Computing Language (OpenCL) is used to estimate 2-D vector velocity flow in vivo […]
Feb, 26

Comparative evaluation of platforms for parallel Ant Colony Optimization

The rapidly growing field of nature-inspired computing concerns the development and application of algorithms and methods based on biological or physical principles. This approach is particularly compelling for practitioners in high-performance computing, as natural algorithms are often inherently parallel in nature (for example, they may be based on a "swarm"-like model that uses a population […]
Feb, 17

The battle of the giants: a case study of GPU vs FPGA optimisation for real-time image processing

This paper focuses on a thorough comparison of the two main hardware targets for real-time optimization of a computer vision algorithm: GPU and FPGA. Based on a complex case study algorithm for threaded isle detection, implementation on both hardware targets is compared in terms of resulting time performance, code translation effort, hardware cost, power efficiency […]
Feb, 17

Efficient pseudo-random number generation for monte-carlo simulations using graphic processors

A hybrid approach based on the combination of three Tausworthe generators and one linear congruential generator for pseudo random number generation for GPU programing as suggested in NVIDIA-CUDA library has been used for MONTE-CARLO sampling. On each GPU thread, a random seed is generated on fly in a simple way using the quick and dirty […]
Feb, 14

Porting FEASTFLOW to the Intel Xeon Phi: Lessons Learned

In this paper we report our experiences in porting the FEASTFLOW software infrastructure to the Intel Xeon Phi coprocessor. Our efforts involved both the evaluation of programming models including OpenCL, POSIX threads and OpenMP and typical optimization strategies like parallelization and vectorization. Since the straightforward porting process of the already existing OpenCL version of the […]
Feb, 12

Multi-tier Dynamic Vectorization for Translating GPU Optimizations into CPU Performance

Developing high performance GPU code is labor intensive. Ideally, developers could recoup high GPU development costs by generating high-performance programs for CPUs and other architectures from the same source code. However, current OpenCL compilers for non-GPUs do not fully exploit optimizations in well-tuned GPU codes. To address this problem, we develop an OpenCL implementation that […]
Feb, 12

GROMACS on Hybrid CPU-GPU and CPU-MIC Clusters: Preliminary Porting Experiences, Results and Next Steps

This report introduces hybrid implementation of the Gromacs application, and provides instructions on building and executing on PRACE prototype platforms with Graphical Processing Units (GPU) and Many Intergrated Cores (MIC) accelerator technologies. GROMACS currently employs message-passing MPI parallelism, multi-threading using OpenMP and contains kernels for non-bonded interactions that are accelerated using the CUDA programming language. […]
Feb, 9

GPGPU-Assisted Subpixel Tracking Method for Fiducial Markers

With an aim to realizing highly accurate position estimation, we propose in this paper a method for efficiently and accurately detecting the 3D positions and poses of traditional fiducial markers with black frames in high-resolution images taken by ordinary web cameras. Our tracking method can be efficiently executed utilizing GPGPU computation, and in order to […]
Feb, 9

Extending the SkelCL Skeleton Library for Stencil Computations on Multi-GPU Systems

The implementation of stencil computations on modern, massively parallel systems with GPUs and other accelerators currently relies on manually-tuned coding using low-level approaches like OpenCL and CUDA, which makes it a complex, time-consuming, and error-prone task. We describe how stencil computations can be programmed in our SkelCL approach that combines high level of programming abstraction […]
Feb, 4

Fast Acceleration of 2D Wave Propagation Simulations Using Modern Computational Accelerators

Recent developments in modern computational accelerators like Graphics Processing Units (GPUs) and coprocessors provide great opportunities for making scientific applications run faster than ever before. However, efficient parallelization of scientific code using new programming tools like CUDA requires a high level of expertise that is not available to many scientists. This, plus the fact that […]
Feb, 2

High energy electromagnetic particle transportation on the GPU

We present massively parallel high energy electromagnetic particle transportation through a finely segmented detector on a Graphics Processing Unit (GPU). Simulating events of energetic particle decay in a general-purpose high energy physics (HEP) detector requires intensive computing resources, due to the complexity of the geometry as well as physics processes applied to particles copiously produced […]
Jan, 23

OpenSSL acceleration using Graphics Processing Units

Cryptography: The study of techniques focused on security. Typically, an implementation of cryptography is computationally heavy, leading to performance issues on general purpose systems. Adding the possibility of offloading cryptographic operations to a Graphics Processing Unit (GPU) onto a widespread, open-source cryptographic library such as OpenSSL would be extremely useful in lightening the CPU load […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: