high performance computing on graphics processing units: hgpu.org

Posts

Oct, 29

Using OpenCL for image analysis

This thesis investigates the suitability of OpenCL for acceleration of Image analysis operations from a developers perspective. To achieve this four representative problems: Morphological operations, Convolution, Watershedding and Markov random field-based texture segmentation are evaluated. The selected problems offers different implementation issues in terms of locality of the operations and load versus computation. The thesis […]

OpenCL

Oct, 29

Using GPUs to Accelerate Installed Antenna Performance Simulations

Savant is a asymptotic ray-tracing CEM tool used to predict the performance of antennas installed on electrically large platforms, including far-field antenna patterns, near-field distributions, and antenna-to-antenna coupling. Savant is based on the shooting and bouncing rays (SBR) formulation. While asymptotic solvers like Savant have significantly smaller computational and memory requirements for electrically large problems […]

CUDA

Oct, 29

An Exploration of OpenCL on Multiple Hardware Platforms for a Numerical Relativity Application

Currently there is considerable interest in making use of many-core processor architectures, such as Nvidia and AMD graphics processing units (GPUs) for scientific computing. In this work we explore the use of the Open Computing Language (OpenCL) for a typical Numerical Relativity application: a time-domain Teukolsky equation solver (a linear, hyperbolic, partial differential equation solver […]

OpenCL

Oct, 28

An Adaptive Framework for Managing Heterogeneous Many-Core Clusters

The computing needs and the input and result datasets of modern scientific and enterprise applications are growing exponentially. To support such applications, High-Performance Computing (HPC) systems need to employ thousands of cores and innovative data management. At the same time, an emerging trend in designing HPC systems is to leverage specialized asymmetric multicores, such as […]

CUDA

Oct, 28

Compiling Stream Applications for Heterogeneous Architectures

Heterogeneous processing systems have become the industry standard in almost every segment of the computing market from servers to mobile systems. In addition to employing shared/distributed memory processors, the current trend is to use hardware components such as field programmable gate arrays (FPGAs), single instruction multiple data (SIMD) engines and graphics processing units (GPUs) in […]

CUDA

Oct, 28

Architectural Exploration and Scheduling Methods for Coarse Grained Reconfigurable Arrays

Coarse Grained Reconfigurable Arrays have emerged, in recent years, as promising candidates to realize efficient reconfigurable platforms. CGRAs feature high computational density, flexible routing interconnect and rapid reconfiguration, characteristics that make them well-suited to speed up execution of computational kernels. A number of designs embodying the CGRA concept have been proposed in literature, most of […]

Oct, 28

Architecture-based Performance Evaluation of Genetic Algorithms on Multi/Many-core Systems

A Genetic Algorithm (GA) is a heuristic to find exact or approximate solutions to optimization and search problems within an acceptable time. We discuss GAs from an architectural perspective, offering a general analysis of GAs on multi-core CPUs and on GPUs, with solution quality considered. We describe widely-used parallel GA schemes based on Master-Slave, Island […]

CUDA

Oct, 28

Matrix inversion speed up with CUDA

In this project several mathematic algorithms are developed to obtain a matrix inversion method – that combines CUDA’s parallel architecture and MATLAB which is actually faster than MATLAB’s built in inverse matrix function. This matrix inversion method is intended to be used for image reconstruction as a faster alternative to iterative methods with a comparable […]

CUDA

Oct, 28

Parallel Random Numbers: As Easy as 1, 2, 3

Most pseudorandom number generators (PRNGs) scale poorly to massively parallel high-performance computation because they are designed as sequentially dependent state transformations. We demonstrate that independent, keyed transformations of counters produce a large alternative class of PRNGs with excellent statistical properties (long period, no discernable structure or correlation). These counter-based PRNGs are ideally suited to modern […]

CUDA

•

OpenCL

Oct, 28

Programming Massively Parallel Processors with CUDA (audio course)

Virtually all semiconductor market domains, including PCs, game consoles, mobile handsets, servers, supercomputers, and networks, are converging to concurrent platforms. There are two important reasons for this trend. First, these concurrent processors can potentially offer more effective use of chip space and power than traditional monolithic microprocessors for many demanding applications. Second, an increasing number […]

CUDA

•

OpenCL

Oct, 28

Implementing Domain-Specific Languages for Heterogeneous Parallel Computing

Domain-specific languages offer a solution to the performance and the productivity issues in heterogeneous computing systems. The Delite compiler framework simplifies the process of building embedded parallel DSLs. DSL developers can implement domain-specific operations by extending the DSL framework, which provides static optimizations and code generation for heterogeneous hardware. The Delite runtime automatically schedules and […]

CUDA

Oct, 28

Dax: Data Analysis at Extreme

Experts agree that the exascale machine will comprise processors that contain many cores, which in turn will necessitate a much higher degree of concurrency. Software will require a minimum of a 1,000 times more concurrency. Most parallel analysis and visualization algorithms today work by partitioning data and running mostly serial algorithms concurrently on each data […]

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Using OpenCL for image analysis

Using GPUs to Accelerate Installed Antenna Performance Simulations

An Exploration of OpenCL on Multiple Hardware Platforms for a Numerical Relativity Application

An Adaptive Framework for Managing Heterogeneous Many-Core Clusters

Compiling Stream Applications for Heterogeneous Architectures

Architectural Exploration and Scheduling Methods for Coarse Grained Reconfigurable Arrays

Architecture-based Performance Evaluation of Genetic Algorithms on Multi/Many-core Systems

Matrix inversion speed up with CUDA

Parallel Random Numbers: As Easy as 1, 2, 3

Programming Massively Parallel Processors with CUDA (audio course)

Implementing Domain-Specific Languages for Heterogeneous Parallel Computing

Dax: Data Analysis at Extreme

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)