7874

Posts

Jun, 23

Fast motion detection from airborne videos using graphics processing unit

In our previous work, we proposed a joint optical flow and principal component analysis (PCA) approach to improve the performance of optical flow based detection, where PCA is applied on the calculated two-dimensional optical flow image, and motion detection is accomplished by a metric derived from the two eigenvalues. To reduce the computational time when […]
Jun, 23

Hierarchical overlapped tiling

This paper introduces hierarchical overlapped tiling, a transformation that applies loop tiling and fusion to conventional loops. Overlapped tiling is a useful transformation to reduce communication overhead, but it may also generate a significant amount of redundant computation. Hierarchical overlapped tiling performs overlapped tiling hierarchically to balance communication overhead and redundant computation, and thus has […]
Jun, 23

GPU-accelerated Model Checking of Periodic Self-Suspending Real-Time Tasks

Efficient model checking is important in order to make this type of software verification useful for systems that are complex in their structure. If a system is too large or complex then model checking does not simply scale, i.e., it could take too much time to verify the system. This is one strong argument for […]
Jun, 23

Bacon: A GPU Programming System With Just in Time Specialization

This paper describes Bacon, a data-parallel programming system targeting OpenCL-compatible graphics processors. This system is built upon the existing OpenCL standard in order to make it easier for programmers to write high performance kernels for GPU accelerated applications. The OpenCL C syntax is extended into a new language, Bacon C, intended to make development significantly […]
Jun, 23

Parallel Neural Network Training with OpenCL

This paper describes the parallelization of neural network training algorithms on heterogeneous architectures with graphical processing units (GPU). The algorithms used for training are particle swarm optimization and backpropagation. Parallel versions of both methods are presented and speedup results are given as compared to the sequential version. The efficiency of parallel training is investigated in […]
Jun, 21

High performance implementation of hydrodynamic interactions and applications with the sub-cellular element method

An O(N^2) algorithm for computing hydrodynamic interaction (HI) in Brownian dynamics (BD) simulation has been implemented. A CPU and a GPU versions have been build, with the GPU one being tuned for performance, up to 40% of the maximum peak performance being obtained. The implementation was validated through simulations of diffusion polymers and comparisons of […]
Jun, 21

Evaluating the impact of reordering unstructured meshes on the performance of finite volume GPU solvers

In this work, we study the impact of renumbering the cells of unstructured triangular finite volume meshes on the performance of CUDA implementations of several finite volume schemes to simulate two-layer shallow water systems. We have used several numerical schemes with different demands of computational power whose CUDA implementations exploit the texture and L1 cache […]
Jun, 21

CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform

Motivation: New high-throughput sequencing technologies have promoted the production of short reads with dramatically low unit cost. The explosive growth of short read datasets poses a challenge to the mapping of short reads to reference genomes, such as the human genome, in terms of alignment quality and execution speed. Results: We present CUSHAW, a parallelized […]
Jun, 21

FMM-based vortex method for simulation of isotropic turbulence on GPUs, compared with a spectral method

The Lagrangian vortex method offers an alternative numerical approach for direct numerical simulation of turbulence. The fact that it uses the fast multipole method (FMM)—a hierarchical algorithm for N-body problems with highly scalable parallel implementations—as numerical engine makes it a potentially good candidate for exascale systems. However, there have been few validation studies of Lagrangian […]
Jun, 21

High-precision molecular dynamics simulation of UO2-PuO2: Anion self-diffusion in UO2

Our series of articles is devoted to high-precision molecular dynamics simulation of mixed actinide-oxide (MOX) fuel in the approximation of rigid ions and pair interactions (RIPI) using high-performance graphics processors (GPU). In this article we study self-diffusion mechanisms of oxygen anions in uranium dioxide (UO2) with the ten recent and widely used sets of interatomic […]
Jun, 20

GPU Accelerated Greedy Algorithms for Compressed Sensing

For appropriate matrix ensembles, greedy algorithms have proven to be an efficient means of solving the combinatorial optimization problem associated with compressed sensing. This paper describes an implementation for graphics processing units (GPU) of hard thresholding, iterative hard thresholding, normalized iterative hard thresholding, hard thresholding pursuit, and a two stage thresholding algorithm based on compressive […]
Jun, 20

Towards a GPU-based Implementation of Interaction Nets

We present ingpu, a GPU-based evaluator for interaction nets that heavily utilizes their potential for parallel evaluation. We discuss advantages and challenges of the ongoing implementation of ingpu and compare its performance to existing interaction nets evaluators.

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: