13581

Posts

Feb, 27

Face Detection on CUDA

Face Detection finds an application in various fields in today’s world. However CPU single thread implementation of face detection consumes lot of time, and despite various optimization techniques, it performs poorly at real time. With the advent of General Purpose GPU (GPGPU) and growing support for parallel programming language like CUDA, it has become possible […]
Feb, 27

A Graph-Partition-Based Scheduling Policy for Heterogeneous Architectures

In order to improve system performance efficiently, a number of systems choose to equip multi-core and many-core processors (such as GPUs). Due to their discrete memory these heterogeneous architectures comprise a distributed system within a computer. A data-flow programming model is attractive in this setting for its ease of expressing concurrency. Programmers only need to […]
Feb, 27

Simulation of the hydrogen ground state in Stochastic Electrodynamics

Stochastic electrodynamics is a classical theory which assumes that the physical vacuum consists of classical stochastic fields with average energy $frac{1}{2}hbar omega$ in each mode, i.e., the zero-point Planck spectrum. While this classical theory explains many quantum phenomena related to harmonic oscillator problems, hard results on nonlinear systems are still lacking. In this work the […]
Feb, 27

GPU accelerated image reconstruction in a two-strip J-PET tomograph

We present a fast GPU implementation of the image reconstruction routine, for a novel two strip PET detector that relies solely on the time of flight measurements.
Feb, 27

Implementation of Smith-Waterman algorithm in OpenCL for GPUs

In this paper we present an implementation of the Smith-Waterman algorithm. The implementation is done in OpenCL and targets high-end GPUs. This implementation is capable of computing similarity indexes between reference and query sequences. The implementation is designed for the sequence alignment paths calculation. In addition, it is capable of handling very long reference sequences […]
Feb, 24

Very Fast Non-Dominated Sorting

A new and very efficient parallel algorithm for the Fast Non-dominated Sorting of Pareto fronts is proposed. By decreasing its computational complexity, the application of the proposed method allows us to increase the speedup of the best up to now Fast and Elitist Multi-Objective Genetic Algorithm (NSGA-II) more than two orders of magnitude. Formal proofs […]
Feb, 24

Raster2Mesh: Rasterization based CVT meshing

In this paper, we propose to extend high quality Centroidal Voronoi Tessellation (CVT) remeshing techniques to the case of surfaces which are not defined by triangle meshes, such as implicit surfaces. Our key observation is that rasterization routines are usually available to visualize these alternative representations, most often as OpenGL shaders efficiently producing surface samples […]
Feb, 24

A Vision for GPU-accelerated Parallel Computation on Geo-Spatial Datasets

We summarize the need and present our vision for accelerating geo-spatial computations and analytics using a combination of shared and distributed memory parallel platforms, with general-purpose Graphics Processing Units (GPUs) with 100s to 1000s of processing cores in a single chip forming a key architecture to parallelize over. A GPU can yield one-to-two orders of […]
Feb, 24

A Virtual Machine Model for Accelerating Relational Database Joins using a General Purpose GPU

We demonstrate a speedup for database joins using a general purpose graphics processing unit (GPGPU). The technique is novel in that it operates on an SQL virtual machine model developed using CUDA. The implementation compiles an SQL statement to instructions of the virtual machine that are then executed in parallel on the GPU. We use […]
Feb, 24

Accelerating Lagrangian Particle Dispersion in the Atmosphere with OpenCL across Multiple Platforms

FLEXPART is a popular simulator that models the transport and diffusion of air pollutants, based on the Lagrangian approach. It is capable of regional and global simulation and supports both forward and backward runs. A complex model like this contains many calculations suitable for parallelisation. Recently, a GPU-accelerated version of the simulator (FLEXCPP) has been […]
Feb, 23

High Performance Computing of Meshless Time Domain Method on Multi-GPU Cluster

High performance computing of Meshless Time Domain Method (MTDM) on multi-GPU using the supercomputer HA-PACS (Highly Accelerated Parallel Advanced system for Computational Sciences) at University of Tsukuba is investigated. Generally, the finite difference time domain (FDTD) method is adopted for the numerical simulation of the electromagnetic wave propagation phenomena. However, the numerical domain must be […]
Feb, 23

Document Image Binarization Using Image Segmentation Algorithm in Parallel Environment

The Segmentation of text from poorly degraded document images is a very hard due to the high intravariation between the document background and the foreground text of different document images. The algorithms used for Image processing take more time for execution on a single core processor. Graphics Processing Unit (GPU) is becoming most popular due […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: