6122

Posts

Oct, 24

A Hybrid Software Framework for the GPU Acceleration of Multi-Threaded Monte Carlo Applications

Monte Carlo simulations are extensively used in wide of application areas. Although the basic framework of these is simple, they can be extremely computationally intensive. In this paper we present a software framework partitions a generic Monte Carlo simulation into two asynchronous parts: (a) a threaded, GPU-accelerated pseudo-random number generator (or producer), and (b) a […]
Oct, 24

APTCC: Auto Parallelizing Translator From C To CUDA

This paper proposes APTCC, Auto Parallelizing Translator from C to CUDA, a translator from C code to CUDA C without any directives. CUDA C is a programming language for general purpose GPU (GPGPU). CUDA C requires us a special programming manner differently from C. Although there are several pieces of research to reduce this diffculty, […]
Oct, 24

Using OpenCL for Implementing Simple Parallel Graph Algorithms

For the typical graph algorithms encountered most frequently in practice (such as those introduced in typical entry-level algorithms courses: graph searching/traversals, shortest paths problems, strongly connected components and minimum spanning trees) we want to consider practical non-sequential platforms such as the emergence of cost effective General-Purpose computation on Graphics Processing Units (GPGPU). In this paper […]
Oct, 24

Democratizing General Purpose GPU Programming through OpenCL and Scala

General Purpose GPU programming has the potential to increase the speed with which many computations can be done. We show a number of examples of such improvements, investigate how one can benchmark different implementations of GPGPU programming paradigms and how one can measure the productivity of programmers. Finally we implement and describe a simple toolkit […]
Oct, 24

The MOSIX Virtual OpenCL (VCL) Cluster Platform

Heterogeneous computing systems can dramatically increase the performance of parallel applications on clusters. Currently, applications that utilize GPU and APU devices, run their device-specific code only on devices of the same computer were the application runs. This paper presents the MOSIX Virtual OpenCL (VCL) cluster platform that can run unmodified OpenCL applications transparently on clusters […]
Oct, 24

Optimizing a Near-duplicate Document Detection System with SIMD Technologies

Although considerable effort has been devoted to duplicate document detection (DDD) and its applications, there is very limited study on the optimization of its time-consuming functions. An experimental analysis which is conducted on a million Grant Proposal documents from the nsfc.gov.cn shows that even by using the clustering and the sampling methods, the speed of […]
Oct, 23

A Parallel Ray Tracing Architecture Suitable for Application-Specific Hardware and GPGPU Implementations

The Ray Tracing rendering algorithm can produce high-fidelity images of 3-D scenes, including shadow effects, as well as reflections and transparencies. This is currently done at a processing speed of at most 30 frames per second. Therefore, actual implementations of the algorithm are not yet suitable for interactive real-time rendering, which is required in games […]
Oct, 23

Design and Implementation of Centrally-Coordinated Peer-to-Peer Live-streaming

In this thesis, we explore the use of a centrally-coordinated peer-to-peer overlay as a possible solution to the live streaming problem. Our contribution lies in showing that such approach is indeed feasible given that a number of key challenges are met. The motivation behind exploring an alternative design is that, although a number of approaches […]
Oct, 23

Analyzing Soft-Error Vulnerability on GPGPU Microarchitecture

The general-purpose computation on graphic processing units (GPGPU) becomes increasingly popular due to their high computational throughput for data parallel applications. Modern GPU architectures have limited capability for error detection and tolerance since they are originally designed for graphics processing. However, the rigorous execution correctness is required for general-purpose applications. This makes reliability a growing […]
Oct, 23

Power and Performance Studies of the Explicit Multi-Threading (XMT) Architecture

Power and thermal constraints gained critical importance in the design of microprocessors over the past decade. Chipmakers failed to keep power at bay while sustaining the performance growth of serial computers at the rate expected by consumers. As an alternative, they turned to fitting an increasing number of simpler cores on a single die. While […]
Oct, 23

Computer Finite-Difference Time-Domain Simulation of Electromagnetic Wave Propagation using GPUs

The Finite-Difference Time-Domain (FDTD) solution of Maxwell’s equations, a grid-based differential time-domain numerical modeling method, is an approach for the direct modelling of the penetration of structures by continuous plane waves. Although FDTD techniques are considered to be relatively easy to understand and to implement in software, such modelling methods require a high level of […]
Oct, 23

Massively Parallel Algorithms for CFD Simulation and Optimization on Heterogeneous Many-Core Architectures

In this dissertation we provide new numerical algorithms for use in conjunction with simulation based design codes. These algorithms are designed and best suited to run on emerging heterogenous computing architectures which contain a combination of traditional multi-core processors and new programmable many-core graphics processing units (GPUs). We have developed the following numerical algorithms (i) […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: