3216

Posts

Mar, 5

Parallel Computing: The Elephant in the Room

Over the past few years, there has been a shift towards multi-core processors, driven partially by physical limitations. Mistaken assumptions of how effective and useful parallel systems can be have also provided motivation for this change. In this paper, we seek to directly identify the barriers to parallel computation. The barriers are not, as conventional […]
Mar, 5

Inter-Block GPU Communication via Fast Barrier Synchronization

While GPGPU stands for general-purpose computation on graphics processing units, the lack of explicit support for inter-block communication on the GPU arguably hampers its broader adoption as a general-purpose computing device. Interblock communication on the GPU occurs via global memory and then requires barrier synchronization across the blocks, i.e., inter-block GPU communication via barrier synchronization. […]
Mar, 5

Designing Efficient Many-Core Parallel Algorithms for All-Pairs Shortest-Paths Using CUDA

Finding the all-pairs shortest-paths on a large graph is a fundamental problem in many practical applications such as bioinformatics, internet node traffic and network routing. In this paper, we present the designs of two efficient parallel algorithms for many-core GPUs using CUDA. Our algorithms expose substantial fine-grained parallelism while maintaining minimal global communication. By using […]
Mar, 4

A Stream Processor Cluster Architecture Model with the Hybrid Technology of MPI and CUDA

Nowadays, the compute capability of traditional cluster system can’t keep up with the computing needs of a practical application, and these aspects of energy, space technology, etc. have become a huge problem. However, as parallel computing equipment, the stream processor (SP) has a high performance of floating-point operations. NVIDIA GPUs is a typical stream processor […]
Mar, 4

Formal Description and Optimization Based High – Performance Computing on CUDA

In recent years, with the development of GPU, based on the general purpose computation on graphics processors has became a new field. Aiming at the processing of GPU, this paper provides the formal description for data parallel mode, a detailed description of the CUDA programming mode land the principle of optimization. It shows by the […]
Mar, 4

Scene Recognition Acceleration Using CUDA and OpenMP

Scene recognition has become a remarkable field in image processing area, and many methods have been proposed in recent years, in which the idea of extracting the scene gist from global features has been proved to have higher retrieval accuracy compared with many other methods. However, the process of extracting gist is heavily time-consuming and […]
Mar, 4

Towards a Software Transactional Memory for Graphics Processors

The introduction of general purpose computing on many-core graphics processor systems, and the general shift in the industry towards parallelism, has created a demand for ease of parallelization. Software transactional memory (STM) simplifies development of concurrent code by allowing the programmer to mark sections of code to be executed concurrently and atomically in an optimistic […]
Mar, 4

Some of the What?, Why?, How?, Who? and Where? of Graphics Processing Unit Computing for Bayesian Analysis

Over the last 20 years or so, a number of Bayesian researchers and groups have invested a good deal of time, effort and money in parallel computing for Bayesian analysis. The growth of “small research group” to “institutionally supported” cluster computational facilities has had a substantial impact on a number of areas of Bayesian analysis, […]
Mar, 4

Acceleration of Medical Image Registration using Graphics Process Units in Computing Normalized Mutual Information

This paper presents a computational performance analysis of an accelerated medical image registration using Graphics Processing Units (GPUs). In our previous work, a multi-resolution approach using normalized mutual information (NMI) has proven to be useful in medical image registration. In this paper, we propose an acceleration of the NMI procedure using GPU implementation because of […]
Mar, 4

Understanding GPU Programming for Statistical Computation: Studies in Massively Parallel Massive Mixtures

We describe advances in statistical computation for large-scale data analysis in structured Bayesian mixture models via GPU (graphics processing unit) programming. The developments are partly motivated by computational challenges arising in increasingly prevalent biological studies using high-throughput flow cytometry methods, generating many, very large data sets and requiring increasingly high-dimensional mixture models with large numbers […]
Mar, 4

Architecture-Aware Optimization Targeting Multithreaded Stream Computing

Optimizing program execution targeted for Graphics Processing Units (GPUs) can be very challenging. Our ability to efficiently map serial code to a GPU or stream processing platform is a time consuming task and is greatly hampered by a lack of detail about the underlying hardware. Programmers are left to attempt trial and error to produce […]
Mar, 4

Redesigning combustion modeling algorithms for the Graphics Processing Unit (GPU): Chemical kinetic rate evaluation and ordinary differential equation integration

Detailed modeling of complex combustion kinetics remains challenging and often intractable, due to prohibitive computational costs incurred when solving the associated large kinetic mechanisms. The Graphics Processing Unit (GPU), originally designed for graphics rendering on computer and gaming systems, has recently emerged as a powerful, cost-effective supplement to the Central Processing Unit (CPU) for dramatically […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: