11865

Posts

Apr, 7

GPU-Accelerated Face Detection Algorithm

This work is an overview of a preliminary experience in developing high-performance face detection accelerated by GPU co-processors. The objective is to illustrate the advantages and difficulties encountered while utilizing the GPU technology to perform face detection. Moreover the introduced implementation is a much faster than currently existing techniques. Previous techniques for speeding up face […]
Apr, 7

A Study on Efficient Application Mapping on Parallel Computing Accelerators

Since the invention of electronic computers, their performance has been constantly advanced. The recent progress of micro processors in performance has been mainly achieved by increasing the number of cores on a device, instead of increasing working frequency. In addition, because of increasing of density of semiconductors, not only computational performance but also density of […]
Apr, 7

Parallel processing for SAR image generation in CUDA – GPGPU platform

High resolution imagery from synthetic aperture radar (SAR) video data requires numerical computations of the order of gigaflops (GFLOP). The computational burden increases with the image size and the amount of input raw video signals. General purpose graphic processor units (GPGPU) can play a pivotal role in parallel processing the raw video data to generate […]
Apr, 7

Acceleration of a Full-scale Industrial CFD Application with OP2

Hydra is a full-scale industrial CFD application used for the design of turbomachinery at Rolls Royce plc. It consists of over 300 parallel loops with a code base exceeding 50K lines and is capable of performing complex simulations over highly detailed unstructured mesh geometries. Unlike simpler structured-mesh applications, which feature high speed-ups when accelerated by […]
Apr, 7

Implementing a Sparse Matrix Vector Product for the SELL-C/SELL-C-sigma formats on NVIDIA GPUs

Numerical methods in sparse linear algebra typically rely on a fast and efficient matrix vector product, as this usually is the backbone of iterative algorithms for solving eigenvalue problems or linear systems. Against the background of a large diversity in the characteristics of high performance computer architectures, it is a challenge to derive a cross-platform […]
Apr, 6

Optimizing Krylov Subspace Solvers on Graphics Processing Units

Krylov subspace solvers are often the method of choice when solving sparse linear systems iteratively. At the same time, hardware accelerators such as graphics processing units (GPUs) continue to offer significant floating point performance gains for matrix and vector computations through easy-to-use libraries of computational kernels. However, as these libraries are usually composed of a […]
Apr, 6

A Low-Power Hybrid CPU-GPU Sort

This thesis analyses the energy efficiency of a low-power CPU-GPU hybrid architecture. We evaluate the NVIDIA Ion architecture, which couples an Intel Atom low power processor with an integrated GPU that has an order of magnitude fewer processors compared to traditional discrete GPUs. We attempt to create a system that balances computation and I/O capabilities […]
Apr, 6

High Performance Computing for Large Graphs of Internet Applications using GPU

The high speed CPU based routers currently in use could not handle the massive data required for real-time multimedia communication. Graphics processing units (GPUs) offer an appreciable alternative due to high computation power which results from their parallel execution units. This paper presents the implementation of the Dijkstra’s link state IP routing algorithm using GPU. […]
Apr, 6

GPU Accelerated Fractal Image Compression for Medical Imaging in Parallel Computing Platform

In this paper, we implemented both sequential and parallel version of fractal image compression algorithms using CUDA (Compute Unified Device Architecture) programming model for parallelizing the program in Graphics Processing Unit for medical images, as they are highly similar within the image itself. There are several improvement in the implementation of the algorithm as well. […]
Apr, 6

Parallel Support Vector Machines in Practice

In this paper, we evaluate the performance of various parallel optimization methods for Kernel Support Vector Machines on multicore CPUs and GPUs. In particular, we provide the first comparison of algorithms with explicit and implicit parallelization. Most existing parallel implementations for multi-core or GPU architectures are based on explicit parallelization of Sequential Minimal Optimization (SMO) […]
Apr, 4

CUDA programs for GPU computing of Swendsen-Wang multi-cluster spin flip algorithm: 2D and 3D Ising, Potts, and XY models

We present sample CUDA programs for the GPU computing of the Swendsen-Wang multi-cluster spin flip algorithm. We deal with the classical spin models; the Ising model, the q-state Potts model, and the classical XY model. As for the lattice, both the 2D (square) lattice and the 3D (simple cubic) lattice are treated. We already reported […]
Apr, 4

An efficient GPU acceptance-rejection algorithm for the selection of the next reaction to occur for Stochastic Simulation Algorithms

MOTIVATION: The Stochastic Simulation Algorithm (SSA) has largely diffused in the field of systems biology. This approach needs many realizations for establishing statistical results on the system under study. It is very computationnally demanding, and with the advent of large models this burden is increasing. Hence parallel implementation of SSA are needed to address these […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: