high performance computing on graphics processing units: hgpu.org

Posts

Oct, 29

GPU based particle system

GPGPU (General purpose computing on graphics processing unit) is quite common in today’s modern computer games when doing heavy simulation calculations like game physics or particle systems. GPU programming is not only used in games but also in scientific research when doing heavy calculations on molecular structures and protein folding etc. The reason why you […]

OpenCL

Oct, 29

Using OpenCL for image analysis

This thesis investigates the suitability of OpenCL for acceleration of Image analysis operations from a developers perspective. To achieve this four representative problems: Morphological operations, Convolution, Watershedding and Markov random field-based texture segmentation are evaluated. The selected problems offers different implementation issues in terms of locality of the operations and load versus computation. The thesis […]

OpenCL

Oct, 29

Using GPUs to Accelerate Installed Antenna Performance Simulations

Savant is a asymptotic ray-tracing CEM tool used to predict the performance of antennas installed on electrically large platforms, including far-field antenna patterns, near-field distributions, and antenna-to-antenna coupling. Savant is based on the shooting and bouncing rays (SBR) formulation. While asymptotic solvers like Savant have significantly smaller computational and memory requirements for electrically large problems […]

CUDA

Oct, 29

An Exploration of OpenCL on Multiple Hardware Platforms for a Numerical Relativity Application

Currently there is considerable interest in making use of many-core processor architectures, such as Nvidia and AMD graphics processing units (GPUs) for scientific computing. In this work we explore the use of the Open Computing Language (OpenCL) for a typical Numerical Relativity application: a time-domain Teukolsky equation solver (a linear, hyperbolic, partial differential equation solver […]

OpenCL

Oct, 28

An Adaptive Framework for Managing Heterogeneous Many-Core Clusters

The computing needs and the input and result datasets of modern scientific and enterprise applications are growing exponentially. To support such applications, High-Performance Computing (HPC) systems need to employ thousands of cores and innovative data management. At the same time, an emerging trend in designing HPC systems is to leverage specialized asymmetric multicores, such as […]

CUDA

Oct, 28

Compiling Stream Applications for Heterogeneous Architectures

Heterogeneous processing systems have become the industry standard in almost every segment of the computing market from servers to mobile systems. In addition to employing shared/distributed memory processors, the current trend is to use hardware components such as field programmable gate arrays (FPGAs), single instruction multiple data (SIMD) engines and graphics processing units (GPUs) in […]

CUDA

Oct, 28

Architectural Exploration and Scheduling Methods for Coarse Grained Reconfigurable Arrays

Coarse Grained Reconfigurable Arrays have emerged, in recent years, as promising candidates to realize efficient reconfigurable platforms. CGRAs feature high computational density, flexible routing interconnect and rapid reconfiguration, characteristics that make them well-suited to speed up execution of computational kernels. A number of designs embodying the CGRA concept have been proposed in literature, most of […]

Oct, 28

Architecture-based Performance Evaluation of Genetic Algorithms on Multi/Many-core Systems

A Genetic Algorithm (GA) is a heuristic to find exact or approximate solutions to optimization and search problems within an acceptable time. We discuss GAs from an architectural perspective, offering a general analysis of GAs on multi-core CPUs and on GPUs, with solution quality considered. We describe widely-used parallel GA schemes based on Master-Slave, Island […]

CUDA

Oct, 28

Matrix inversion speed up with CUDA

In this project several mathematic algorithms are developed to obtain a matrix inversion method – that combines CUDA’s parallel architecture and MATLAB which is actually faster than MATLAB’s built in inverse matrix function. This matrix inversion method is intended to be used for image reconstruction as a faster alternative to iterative methods with a comparable […]

CUDA

Oct, 28

Parallel Random Numbers: As Easy as 1, 2, 3

Most pseudorandom number generators (PRNGs) scale poorly to massively parallel high-performance computation because they are designed as sequentially dependent state transformations. We demonstrate that independent, keyed transformations of counters produce a large alternative class of PRNGs with excellent statistical properties (long period, no discernable structure or correlation). These counter-based PRNGs are ideally suited to modern […]

CUDA

•

OpenCL

Oct, 28

Programming Massively Parallel Processors with CUDA (audio course)

Virtually all semiconductor market domains, including PCs, game consoles, mobile handsets, servers, supercomputers, and networks, are converging to concurrent platforms. There are two important reasons for this trend. First, these concurrent processors can potentially offer more effective use of chip space and power than traditional monolithic microprocessors for many demanding applications. Second, an increasing number […]

CUDA

•

OpenCL

Oct, 28

Implementing Domain-Specific Languages for Heterogeneous Parallel Computing

Domain-specific languages offer a solution to the performance and the productivity issues in heterogeneous computing systems. The Delite compiler framework simplifies the process of building embedded parallel DSLs. DSL developers can implement domain-specific operations by extending the DSL framework, which provides static optimizations and code generation for heterogeneous hardware. The Delite runtime automatically schedules and […]

CUDA