Posts
Feb, 9
Effectiveness of program transformations and compilers for directive-based GPU programming models
Accelerator devices like the General Purpose Graphics Computing Units (GPGPUs) play an important role in enhancing the performance of many contemporary scientific applications. However, programming GPUs using languages like C for CUDA or OpenCL requires relatively high investment of time and the resulting programs are often fine-tuned to perform well only on a particular device. […]
Feb, 9
Using Hybrid Shared and Distributed Caching for Mixed-Coherency GPU Workloads
Current GPU computing models support a mixture of coherent and incoherent classes of memory operations. Workloads using these models typically have working sets too large to fit in an economical SRAM structure. Still, GPU architectures have last-level caches to primarily fulfill two functions: eliminate redundant DRAM accesses servicing requests from different L1 caches to the […]
Feb, 9
GPU-based Monte Carlo radiotherapy dose calculation using phase-space sources
A novel phase-space source implementation has been designed for GPU-based Monte Carlo dose calculation engines. Due to the parallelized nature of GPU hardware, it is essential to simultaneously transport particles of the same type and similar energies but separated spatially to yield a high efficiency. We present three methods for phase-space implementation that have been […]
Feb, 8
Optimized GPU Implementation and Performance Analysis of HC Series of Stream Ciphers
The ease of programming offered by the CUDA programming model attracted a lot of programmers to try the platform for acceleration of many non-graphics applications. Cryptography, being no exception, also found its share of exploration efforts, especially block ciphers. In this contribution we present a detailed walk-through of effective mapping of HC-128 and HC-256 stream […]
Feb, 7
Efficient Wave Propagation in Discontinuous Media and Complex Geometry for Many-core Architectures
We present an accelerated numerical solver for the scalar wave equation using one and two GPUs. We consider complex geometry and study accuracy when performing the computation in both single and double precision. The method uses a high-order accurate approximation of the derivatives using summation-by-parts operators. The boundary conditions are imposed using the simultaneous approximation […]
Feb, 7
Betweenness Centrality on GPUs and Heterogeneous Architectures
The betweenness centrality metric has always been intriguing for graph analyses and used in various applications. Yet, it is one of the most computationally expensive kernels in graph mining. In this work, we investigate a set of techniques to make the betweenness computations faster on GPUs as well as on heterogeneous CPU/GPU architectures. Our techniques […]
Feb, 7
GPU Implementation of Iterative Solvers in Numerical Weather Predicting Models
Numerical weather predicting models often require solving a 3-D Helmholtz problem which derived from the governing equation of dynamical core in Met Office Unified Model, by preconditioned iterative solvers. In this dissertation, a GPU implementation of preconditioned conjugate gradient (CG) iterative method will be focused on. A given serial code has been ported on GPU. […]
Feb, 6
Action Spotting and Recognition Based on a Spatiotemporal Orientation Analysis
This paper provides a unified framework for the interrelated topics of action spotting, the spatiotemporal detection and localization of human actions in video, and action recognition, the classification of a given video into one of several predefined categories. A novel compact local descriptor of video dynamics in the context of action spotting and recognition is […]
Feb, 6
Binary Interval Search: a scalable algorithm for counting interval intersections
MOTIVATION: The comparison of diverse genomic datasets is fundamental to understand genome biology. Researchers must explore many large datasets of genome intervals (e.g. genes, sequence alignments) to place their experimental results in a broader context and to make new discoveries. Relationships between genomic datasets are typically measured by identifying intervals that intersect, that is, they […]
Feb, 6
Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi
Intel Xeon Phi is a recently released high-performance coprocessor which features 61 cores each supporting 4 hardware threads with 512-bit wide SIMD registers achieving a peak theoretical performance of 1Tflop/s in double precision. Many scientific applications involve operations on large sparse matrices such as linear solvers, eigensolver, and graph mining algorithms. The core of most […]
Feb, 6
An Implementation of Conflict-Free Offline Permutation on the GPU
The Discrete Memory Machine (DMM) is a theoretical parallel computing model that captures the essence of the shared memory access of GPUs. The bank conflicts should be avoided for maximizing the bandwidth of the shared memory access. Offline permutation of an array is a task to copy all elements in array $a$ into array $b$ […]
Feb, 6
Hardware Accelerated Molecular Docking: A Survey
Hardware acceleration is the general concept of applying a specialized hardware for a given problem instead of an ordinary CPU in order to get lower processing time. General purpose CPUs can be considered as a totally general platform suitable for executing virtually any software or algorithm. Application specific accelerators have a custom architecture that fits […]