4924

Posts

Jul, 22

AES Encryption Implementation on CUDA GPU and Its Analysis

GPU has a good performance ratio and exhibits the capability for applications with high level of parallelism despite its inexpensive price. The support of integer and logical instructions on the latest generation of GPU makes us to implement cipher algorithms easier with the same instructions. However the decisions such as parallel processing granularity or memory […]
Jul, 22

Large Scale Simulations of the Euler Equations on GPU Clusters

The paper investigates the scalability of a parallel Euler solver, using the Vijayasundaram method, on a GPU cluster with 32 Nvidia Geforce GTX 295 boards. The aim of this research is to enable large scale fluid dynamics simulations with up to one billion elements. We investigate communication protocols for the GPU cluster to compensate for […]
Jul, 22

Mapping the Arnold web with a GPU-supercomputer

The Arnold diffusion constitutes a dynamical phenomenon which may occur in the phase space of a non-integrable Hamiltonian system whenever the number of the system degrees of freedom is $M geq 3$. The diffusion is mediated by a web-like structure of resonance channels, which penetrates the phase space and allows the system to explore the […]
Jul, 22

Accelerating Radio Astronomy Cross-Correlation with Graphics Processing Units

We present a highly parallel implementation of the cross-correlation of time-series data using graphics processing units (GPUs), which is scalable to hundreds of independent inputs and suitable for the processing of signals from "Large-N" arrays of many radio antennas. The computational part of the algorithm, the X-engine, is implementated efficiently on Nvidia’s Fermi architecture, sustaining […]
Jul, 22

A Comparison of xPU Platforms Exemplified with Ray Tracing Algorithms

Over the years, faster hardware – with higher clock rates – has been the usual way to improve computing times in computer graphics. Aside from highly costly parallel solutions only affordable by big industries – like the movie industry -, there was no alternative available to desktop users. Nevertheless, this scenario is dramatically changing with […]
Jul, 22

A History-Based Performance Prediction Model with Profile Data Classification for Automatic Task Allocation in Heterogeneous Computing Systems

In this paper, we propose a runtime performance prediction model for automatic selection of accelerators to execute kernels in OpenCL. The proposed method is a history-based approach that uses profile data for performance prediction. The profile data are classified into some groups, from each of which its own performance model is derived. As the execution […]
Jul, 22

Hybrid OpenCL: Enhancing OpenCL for Distributed Processing

We have been developing Hybrid OpenCL, which enables the utilization of OpenCL devices by connecting them over the network. Hybrid OpenCL opens a gate to scale up OpenCL environments. By using Hybrid OpenCL, applications written in OpenCL can be easily ported to high performance cluster computers, thus, Hybrid OpenCL can provide more various distributed and […]
Jul, 22

A self-organization based optical flow estimator with GPU implementation (thesis)

This work describes a parallelizable optical flow field estimator based upon a modified batch version of the Self-Organizing Map (SOM). This estimator handles the ill-posedness in gradient-based motion estimation via a novel combination of regression and self-organization. The aperture problem is treated using an algebraic framework that partitions motion estimates obtained from regression into two […]
Jul, 22

A self-organization based optical flow estimator with GPU implementation

This work describes a parallelizable optical flow field estimator based upon a modified batch version of the Self-Organizing Map (SOM). This estimator handles the ill-posedness in gradient-based motion estimation via a novel combination of regression and self-organization. The aperture problem is treated using an algebraic framework that partitions motion estimates obtained from regression into two […]
Jul, 22

Parallelizing the Cellular Potts Model on graphics processing units

The Cellular Potts Model (CPM) is a lattice based modeling technique used for simulating cellular structures in computational biology. The computational complexity of the model means that current serial implementations restrict the size of simulation to a level well below biological relevance. Parallelization on computing clusters enables scaling the size of the simulation but marginally […]
Jul, 22

A refactoring tool to extract GPU kernels

Significant performance gains can be achieved by using hardware architectures that integrate GPUs with conventional CPUs to form a hybrid and highly parallel computational engine. However, programming these novel architectures is tedious and error prone, reducing their ease of acceptance in an even wider range of computationally intensive applications. In this paper we discuss a […]
Jul, 22

Molecular Dynamics Simulations Using Graphics Processing Units

It is increasingly easy to develop software that exploits Graphics Processing Units (GPUs). The molecular dynamics simulation community has embraced this recent opportunity. Herein, we outline the current approaches that exploit this technology. In the context of biomolecular simulations, we discuss some of the algorithms that have been implemented and some of the aspects that […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: