10147

Posts

Jul, 20

Real Time Pixel Art Remasterization on GPUs

Several methods have been proposed to overcome the pixel art scaling problem through the years. In this article we describe a novel approach to be applied through a massively parallel architecture that can address this issue in real time. To achieve this we design a local and context independent algorithm that enables an efficient parallel […]
Jul, 20

An Efficient Deterministic Parallel Algorithm for Adaptive Multidimensional Numerical Integration on GPUs

Recent development in Graphics Processing Units (GPUs) has enabled a new possibility for highly efficient parallel computing in science and engineering. Their massively parallel architecture makes GPUs very effective for algorithms where processing of large blocks of data can be executed in parallel. Multidimensional integration has important applications in areas like computational physics, plasma physics, […]
Jul, 19

OpenCL API Extensions to achieve Multi-level Parallelism for Efficient Implementation of Strassen’s Matrix Multiplication on GPUs

Strassen’s matrix multiplication algorithm is an efficient and widely used practical algorithm for matrix multiplication. In its basic form, the algorithm is a series of recursive steps to decompose the matrices, multiply intermediate matrices and another set of recursive steps to recompose the product matrix. Implementing the algorithm on a GPU requires it to be […]
Jul, 19

HAccRG: Hardware-Accelerated Data Race Detection in GPUs

Modern Graphics Processing Units (GPUs) are capable of supporting thousands of concurrent threads. However, they provide relatively little guarantee with respect to the coherence and consistency of the memory system. Thus, GPUs are prone to multitude of concurrency bugs related to inconsistent memory states. Many such bugs manifest as some form of data races at […]
Jul, 19

Methods for Optimizing OpenCL Applications on Heterogeneous Multicore Architectures

Heterogeneous multicore architectures with CPU and add-on GPUs or streaming processors are now widely used in computer systems. These GPUs provide substantially more computation capability and memory bandwidth compared to traditional multi-cores. Also, because they are highly programmable, they provide the computational performance needed for realistic graphics rendering. Applications with general computations can also be […]
Jul, 19

Parallel Image Segmentation Using Reduction-Sweeps On Multicore Processors and GPUs

In this paper we introduce the Reduction Sweep algorithm, a novel graph-based image segmentation algorithm that is designed for easy parallelization. It is based on a clustering approach focusing on local image characteristics. Each pixel is compared with its neighbors in an implicitly independent manner, and those deemed sufficiently similar according to a color criterion […]
Jul, 19

On Benchmarking the Matrix Multiplication Algorithm using OpenMP, MPI and CUDA Programming Languages

Parallel programming languages represent a common theme in the evolution of high performance computing (HPC) systems. There are several parallel programming languages that are directly associated with different HPC systems. In this paper, we compare the performance of three commonly used parallel programming languages, namely: OpenMP, MPI and CUDA. Our performance evaluation of these languages […]
Jul, 17

A Software-Based Self Test of CUDA Fermi GPUs

Nowadays, Graphical Processing Units (GPUs) have become increasingly popular due to their high computational power and low prices. This makes them particularly suitable for high-performance computing applications, like data elaboration and financial computation. In these fields, high efficient test methodologies are mandatory. One of the most effective ways to detect and localize hardware faults in […]
Jul, 17

Parallelization the Job-shop Problem on Distributed and Shared Memory Architectures

The paper presents the parallel algorithm for solving the scheduling problem. This algorithm is implemented in the distributed memory multi-computers, and with each machine using CPU – GPU shared memory architecture, so that the time to complete the work as quickly as possible. This algorithm is based on the branching algorithm approach for searching. The […]
Jul, 17

Starchart: Hardware and Software Optimization Using Recursive Partitioning Regression Trees

Graphics processing units (GPUs) are in increasingly wide use, but significant hurdles lie in selecting the appropriate algorithms, runtime parameter settings, and hardware configurations to achieve power and performance goals with them. Exploring hardware and software choices requires time-consuming simulations or extensive real-system measurements. While some auto-tuning support has been proposed, it is often narrow […]
Jul, 17

Early Experiences With The OpenMP Accelerator Model

A recent trend in mainstream computer nodes is the combined use of general-purpose multicore processors and specialized accelerators such as GPUs and DSPs in order to achieve better performance and to reduce power consumption. To support this trend, the OpenMP Language Committee has approved a set of extensions to OpenMP (referred to as the OpenMP […]
Jul, 17

Parallel heterogeneous Branch and Bound algorithms for multi-core and multi-GPU environments

Branch and Bound (B&B) algorithms are attractive for solving to optimality combinatorial optimization problems (COPs) by exploring a tree-based search space. Nevertheless, they are highly time-intensive when dealing with large problem instances (e.g. Taillard’s FSP benchmarks) even using grid computing [Mezmaz et al., IEEE IPDPS’2007]. Massively parallel computing supplied through today’s heterogeneous (GPU-enhanced multicore) platforms […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: