Posts
Feb, 11
Accurate Cross-Architecture Performance Modeling for Sparse Matrix-Vector Multiplication (SpMV) on GPUs
This paper presents an integrated analytical and profile-based cross-architecture performance modeling tool to specifically provide inter-architecture performance prediction for Sparse Matrix-Vector Multiplication (SpMV) on NVIDIA GPU architectures. To design and construct the tool, we investigate the inter-architecture relative performance for multiple SpMV kernels. For a sparse matrix, based on its SpMV kernel performance measured on […]
Feb, 11
Implementing the Projected Spatial Rich Features on a GPU
The Projected Spatial Rich Model (PSRM) generates powerful steganalysis features, but requires the calculation of tens of thousands of convolutions with image noise residuals. This makes it very slow: the reference implementation takes an impractical 20{30 minutes per 1 megapixel (Mpix) image. We present a case study which first tweaks the definition of the PSRM […]
Feb, 11
Genetically Improved CUDA C++ Software
Genetic Programming (GP) may dramatically increase the performance of software written by domain experts. GP and autotuning are used to optimise and refactor legacy GPGPU C code for modern parallel graphics hardware and software. Speed ups of more than six times on recent nVidia GPU cards are reported compared to the original kernel on the […]
Feb, 9
GPGPU-Assisted Subpixel Tracking Method for Fiducial Markers
With an aim to realizing highly accurate position estimation, we propose in this paper a method for efficiently and accurately detecting the 3D positions and poses of traditional fiducial markers with black frames in high-resolution images taken by ordinary web cameras. Our tracking method can be efficiently executed utilizing GPGPU computation, and in order to […]
Feb, 9
Benchmarks for Intel MIC Architecture
Intel Many Integrated Core (MIC) Architecture combines about 60 cores onto a single chips. Intel MIC brand named Xeon Phi offers a theoretical maximum of more than 3 double precision GFLOPs than Intel Xeon E5 core. We carry out benchmarks for Intel MIC with a Monte Carlo simulation of LIBOR Market Model. The results show […]
Feb, 9
Extending the SkelCL Skeleton Library for Stencil Computations on Multi-GPU Systems
The implementation of stencil computations on modern, massively parallel systems with GPUs and other accelerators currently relies on manually-tuned coding using low-level approaches like OpenCL and CUDA, which makes it a complex, time-consuming, and error-prone task. We describe how stencil computations can be programmed in our SkelCL approach that combines high level of programming abstraction […]
Feb, 9
A Scalable Multi-Path Microarchitecture for Efficient GPU Control Flow
Graphics processing units (GPUs) are increasingly used for non-graphics computing. However, applications with divergent control flow incur performance degradation on current GPUs. These GPUs implement the SIMT execution model by serializing the execution of different control flow paths encountered by a warp. This serialization can mask thread level parallelism among the scalar threads comprising a […]
Feb, 9
GPU-based high-performance computing for radiation therapy
Recent developments in radiotherapy therapy demand high computation powers to solve challenging problems in a timely fashion in a clinical environment. The graphics processing unit (GPU), as an emerging high-performance computing platform, has been introduced to radiotherapy. It is particularly attractive due to its high computational power, small size, and low cost for facility deployment […]
Feb, 8
Fast 2-D Ultrasound Strain Imaging: The Benefits of Using a GPU
Deformation of tissue can be accurately estimated from radio-frequency ultrasound data using a 2-dimensional normalized cross correlation (NCC)-based algorithm. This procedure, however, is very computationally time-consuming. A major time reduction can be achieved by parallelizing the numerous computations of NCC. In this paper, two approaches for parallelization have been investigated: the OpenMP interface on a […]
Feb, 8
State-Based Gauss-Seidel Framework for Real-time 2D Ultrasound Image Sequence Denoising on GPUs
The ultrasound image sequences are not only majorly contaminated by multiplicative noises but they are also usually contaminated with additive noises. As in the past few decades, there were some works, which had focused on removing the noises from ultrasound images, such as in the JY model [1] and in the variational model, which were […]
Feb, 8
Fast 3D Graphics Rendering Technique with CUDA Parallel Processing
3D Graphic Rendering has been used to express realistic, 3-dimensional, and emphasized effects in the graphics. As 3D Graphic Rendering developed and became more prevalent, the need for acceleration in data processing grew as well, leading to a development of GPU (Graphic Processing Unit) and shading language used for GPU such as GLSL (OpenGL Shading […]
Feb, 8
Developmental Directions in Parallel Accelerators
Parallel accelerators such as massively-cored graphical processing units or many-cored co-processors such as the Xeon Phi are becoming widespread and affordable on many systems including blade servers and even desktops. The use of a single such accelerator is now quite common for many applications, but the use of multiple devices and hybrid combinations is still […]