11108

Posts

Dec, 12

Development of Bayesian analysis program for extraction of polarisation observables at CLAS

At the mass of a proton, the strong force is not well understood. Various quark models exist, but it is important to determine which quark model(s) are most accurate. Experimentally, finding resonances predicted by some models and not others would give valuable insight into this fundamental interaction. Several labs around the world use photoproduction experiments […]
Dec, 12

Inter-block synchronization on a GPGPU

With the invention of multi-core processing unit technology, the graphics processing unit has evolved from single core graphic processing unit to multi-core programmable graphics processing units. Because of the GPUs’ architecture, people found that it is not only good at processing graphics related data, but also suitable for performing general purpose parallel computations. However, since […]
Dec, 12

Lessons learned from contrasting a BLAS kernel implementations

This work reviews the experience of implementing different versions of the SSPR rank-one update operation of the BLAS library. The main objective was to contrast CPU versus GPU implementation effort and complexity of an optimized BLAS routine, not considering performance. This work contributes with a sample procedure to compare BLAS kernel implementations, how to start […]
Dec, 12

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems

The increasing scale and wealth of inter-connected data, such as those accrued by social network applications, demand the design of new techniques and platforms to efficiently derive actionable knowledge from large-scale graphs. However, real-world graphs are famously difficult to process efficiently. Not only they have a large memory footprint, but also most graph algorithms entail […]
Dec, 11

Multilayered Abstractions for Partial Differential Equations

How do we build maintainable, robust, and performance-portable scientific applications? This thesis argues that the answer to this software engineering question in the context of the finite element method is through the use of layers of Domain-Specific Languages (DSLs) to separate the various concerns in the engineering of such codes. Performance-portable software achieves high performance […]
Dec, 11

Job Parallelism using Graphical Processing Unit Individual Multi-Processors and Localised Memory

Graphical Processing Units(GPUs) are usually programmed to provide data-parallel acceleration to a host processor. Modern GPUs typically have an internal multi-processor (MP) structure that can be exploited in an unusual way to offer semi-independent task parallelism providing the MPs can operate within their own localised memory and apply data-parallelism to their own problem subset. We […]
Dec, 11

Runtime Support toward Transparent Memory Access in GPU-accelerated Heterogeneous Systems

GPU has become a popular parallel accelerator in modern heterogeneous systems for its great parallelism and superior energy efficiency. However, it also extremely complicates programing the memory system in such heterogeneous systems, due to the non-continuous memory spaces on CPU and GPU, and a two-level memory hierarchy on a GPU itself. The complexity of this […]
Dec, 11

A New Software Based GPU Framework

A software based GPU design, where most of the 3D pipeline is executed in software on shaders, with minimal support from custom hardware blocks, provides three benefits, it: (1) simplifies the GPU design, (2) turns 3D graphics into a general purpose application, and (3) opens the door for applying compiler optimization to the whole 3D […]
Dec, 11

A GPU-Accelerated Framework for Image Processing and Computer Vision

This paper presents and briefly describes the state of the art of accelerating image processing with graphics hardware (GPU) and discusses some of its caveats. Then it describes GpuCV, an open source multiplatform library for GPU-accelerated image processing and Computer Vision operators and applications. It is meant for computer vision scientist not familiar with GPU […]
Dec, 11

High Performance Poisson Equation Solver for Hybrid CPU/GPU Systems

We investigated the possible way for treatment of electrostatic interactions by solving numerically Poisson’s equation using Conjugate Gradient method and Stabilized BiConjugate Gradient method. The aim of the research was to test the execution time of prototype programs running on BLueGene/P and CPU/GPU system. The results show that the tested methods are applicable for electrostatics […]
Dec, 11

GPU Accelerated Parallel Iris Localization

Iris recognition is quite a computation intensive task with huge amounts of pixel processing. After the image acquisition of the eye, Iris recognition is basically divided into Iris localization, Feature Extraction and Matching steps. Each of these tasks involves a lot of processing. It thus becomes essential to improve the performance of each step to […]
Dec, 11

Evaluating tradeoff between recall and performance of GPU permutation index

Query-by-content, by means of similarity search, is a fundamental operation for applications that deal with multimedia data. For this kind of query it is meaningless to look for elements exactly equal to a given one as query. Instead, we need to measure the dissimilarity between the query object and each database object. This search problem […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: