8978

Posts

Feb, 7

Efficient Wave Propagation in Discontinuous Media and Complex Geometry for Many-core Architectures

We present an accelerated numerical solver for the scalar wave equation using one and two GPUs. We consider complex geometry and study accuracy when performing the computation in both single and double precision. The method uses a high-order accurate approximation of the derivatives using summation-by-parts operators. The boundary conditions are imposed using the simultaneous approximation […]
Feb, 7

Betweenness Centrality on GPUs and Heterogeneous Architectures

The betweenness centrality metric has always been intriguing for graph analyses and used in various applications. Yet, it is one of the most computationally expensive kernels in graph mining. In this work, we investigate a set of techniques to make the betweenness computations faster on GPUs as well as on heterogeneous CPU/GPU architectures. Our techniques […]
Feb, 7

GPU Implementation of Iterative Solvers in Numerical Weather Predicting Models

Numerical weather predicting models often require solving a 3-D Helmholtz problem which derived from the governing equation of dynamical core in Met Office Unified Model, by preconditioned iterative solvers. In this dissertation, a GPU implementation of preconditioned conjugate gradient (CG) iterative method will be focused on. A given serial code has been ported on GPU. […]
Feb, 6

Action Spotting and Recognition Based on a Spatiotemporal Orientation Analysis

This paper provides a unified framework for the interrelated topics of action spotting, the spatiotemporal detection and localization of human actions in video, and action recognition, the classification of a given video into one of several predefined categories. A novel compact local descriptor of video dynamics in the context of action spotting and recognition is […]
Feb, 6

Binary Interval Search: a scalable algorithm for counting interval intersections

MOTIVATION: The comparison of diverse genomic datasets is fundamental to understand genome biology. Researchers must explore many large datasets of genome intervals (e.g. genes, sequence alignments) to place their experimental results in a broader context and to make new discoveries. Relationships between genomic datasets are typically measured by identifying intervals that intersect, that is, they […]
Feb, 6

Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi

Intel Xeon Phi is a recently released high-performance coprocessor which features 61 cores each supporting 4 hardware threads with 512-bit wide SIMD registers achieving a peak theoretical performance of 1Tflop/s in double precision. Many scientific applications involve operations on large sparse matrices such as linear solvers, eigensolver, and graph mining algorithms. The core of most […]
Feb, 6

An Implementation of Conflict-Free Offline Permutation on the GPU

The Discrete Memory Machine (DMM) is a theoretical parallel computing model that captures the essence of the shared memory access of GPUs. The bank conflicts should be avoided for maximizing the bandwidth of the shared memory access. Offline permutation of an array is a task to copy all elements in array $a$ into array $b$ […]
Feb, 6

Hardware Accelerated Molecular Docking: A Survey

Hardware acceleration is the general concept of applying a specialized hardware for a given problem instead of an ordinary CPU in order to get lower processing time. General purpose CPUs can be considered as a totally general platform suitable for executing virtually any software or algorithm. Application specific accelerators have a custom architecture that fits […]
Feb, 6

Parallel k-Means Image Segmentation Using Sort, Scan & Connected Components on a GPU

Image segmentation is required to run fast and without supervision to speed up subsequent processes such as object recognition or other high level tasks. General purpose computing on the GPU is a powerful tool to perform efficient image processing and has been applied to the image segmentation problem. However, state-of-the-art approaches still perform parts of […]
Feb, 6

Implementation of Fast Artificial Neural Network for Pattern Classification on Heterogeneous System

Neural networks have been part of an attempt to emulate the learning curve of the human nervous system. Graphics Processing Units (GPUs) that come with a Graphics card have hundreds of processing cores, and have highly parallel architecture. Because of the highly parallel architecture of GPUs, it suits very well for parallel architecture such as […]
Feb, 6

A Multi-GPU Sources Reconstruction Method for Imaging Applications

A profile reconstruction method using a surface inverse currents technique implemented on GPU is presented. The method makes use of the internal fields radiated by an equivalent currents distribution retrieved from scattered field information that is collected from multiple incident fields. Its main advantage over other inverse source-based techniques is the use of surface formulation […]
Feb, 5

Grex: An efficient MapReduce framework for graphics processing units

In this paper, we present a new MapReduce framework, called Grex, designed to leverage general purpose graphics processing units (GPUs) for parallel data processing. Grex provides several new features. First, it supports a parallel split method to tokenize input data of variable sizes, such as words in e-books or URLs in web documents, in parallel […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: