Posts
Aug, 8
Compiler-based Data Prefetching and Streaming Non-temporal Store Generation for the Intel Xeon Phi Coprocessor
The Intel Xeon Phi coprocessor has software prefetching instructions to hide memory latencies and special store instructions to save bandwidth on streaming nontemporal store operations. In this work, we provide details on compiler-based generation of these instructions and evaluate their impact on the performance of the Intel Xeon Phi coprocessor using a wide range of […]
Aug, 8
Improving the GPU space of computation under triangular domain problems
There is a stage in the GPU computing pipeline where a grid of thread-blocks is mapped to the problem domain. Normally, this grid is a k-dimensional bounding box that covers a k-dimensional problem no matter its shape. Threads that fall inside the problem domain perform computations, otherwise they are discarded at runtime. For problems with […]
Aug, 7
Exploring Microcontrollers in GPUs
Recent graphics processing units (GPUs) integrate wimpy microcontrollers on a chip. They are often used to execute firmware code configuring the functional units of GPUs. This paper opens up the programming of these microcontrollers and explores how to utilize them for GPU resource management. Our prototype system provides a compiler suite for NVIDIA’s GPU microcontrollers […]
Aug, 7
Finite Difference Time-Domain Modelling of Metamaterials: GPU Implementation of Cylindrical Cloak
Finite difference time-domain (FDTD) technique can be used to model metamaterials by treating them as dispersive material. Drude or Lorentz model can be incorporated into the standard FDTD algorithm for modelling negative permittivity and permeability. FDTD algorithm is readily parallelisable and can take advantage of GPU acceleration to achieve speed-ups of 5x-50x depending on hardware […]
Aug, 7
Fast Morphological Image Processing on GPU using CUDA
A mathematical morphology is used as a tool for extracting image components that are useful in the representation and description of region shape. The mathematical morphology operations of dilation, erosion, opening, and closing are important building blocks of many other image processing algorithms. The data parallel programming provides an opportunity for performance acceleration using highly […]
Aug, 7
GPU Accelerated Pattern Matching Algorithm for DNA Sequences to Detect Cancer using CUDA
Cancer is one of the severe diseases causing one in eight deaths worldwide. It can be cured if detected at the very first stage where the cancer cells stay fixed in their area. In stage two it will start to spread. When it spread to muscles enters in third stage. It may cause organ failure. […]
Aug, 7
Automatic Skeleton-Based Compilation through Integration with an Algorithm Classification
This paper presents a technique to fully automatically generate efficient and readable code for parallel processors. We base our approach on skeleton-based compilation and "algorithmic species", an algorithm classification of program code. We use a tool to automatically annotate C code with species information where possible. The annotated program code is subsequently fed into the […]
Aug, 6
2D Triangulation of Polygons on CUDA
General Purpose computing on Graphics Processor Units (GPGPU) brings massively parallel computing (hundreds of compute cores) to the desktop at a reasonable cost, but requires that algorithms be carefully designed to take advantage of this power. The present work explores the possibilities of CUDA (NVIDIA Compute Unified Device Architecture) using GPGPU approach for 2D Triangulation […]
Aug, 6
Portable Parallel Kernels for High-Speed Beamforming in Synthetic Aperture Ultrasound Imaging
In medical ultrasound, synthetic aperture (SA) imaging is well-considered as a novel image formation technique for achieving superior resolution than that offered by existing scanners. However, its intensive processing load is known to be a challenging factor. To address such a computational demand, this paper proposes a new parallel approach based on the design of […]
Aug, 6
Efficient bayesian multi-view deconvolution
Light sheet fluorescence microscopy is able to image large specimen with high resolution by imaging the samples from multiple angles. Multi-view deconvolution can significantly improve the resolution and contrast of the images, but its application has been limited due to the large size of the datasets. Here we present a derivation of multi-view Bayesian deconvolution […]
Aug, 6
GPU Acceleration of Graph Matching, Clustering, and Partitioning
We consider sequential algorithms for hypergraph partitioning and GPU (i.e., fine-grained shared-memory parallel) algorithms for graph partitioning and clustering. Our investigation into sequential hypergraph partitioning is concerned with the efficient construction of high-quality matchings for hypergraph coarsening and optimisation with respect to general hypergraph partitioning quality metrics. We introduce the l*(l-1)-metric which exactly measures the […]
Aug, 6
GPU Programming in Functional Languages: A Comparison of Haskell GPU Embedded Domain Specific Languages
Graphical Processing Units (GPUs) are known to be excellent computation accelerators. However, their approach to data processing is very different from regular CPUs. This makes it harder for a regular developer to program these devices. In the past few years, several frameworks were introduced to simplify the programming of GPU devices. Accelerate and Obsidian are […]

