high performance computing on graphics processing units: hgpu.org

Posts

Oct, 30

Exploring Many-Core Design Templates for FPGAs and ASICs

We present a highly productive approach to hardware design based on a many-coremicroarchitectural template used to implement compute-bound applications expressed in a high-level data-parallel language such as OpenCL. The template is customized on a per-application basis via a range of high-level parameters such as the interconnect topology or processing element architecture. The key benefits of […]

OpenCL

Oct, 30

Improving Energy Efficiency of GPU based General-Purpose Scientific Computing through Automated Selection of Near Optimal Configurations

Modern GPUs have been rapidly and increasingly used as a powerful engine for a variety of general-purpose computing applications due to their enormous parallelism and throughput capabilities. However, GPU power consumption still remains high since more and more transistors are integrated into its chip. Until now, how to increase and optimize energy efficiency (e.g., performance-per-Watt […]

CUDA

Oct, 30

Leveraging Binary Translation for Heterogeneous Profiling

Heterogeneous systems, such as those including a graphics processor for general computation, are becoming increasingly common. While this increases the potential computing power that can be leveraged, it also increases the complexity of the system. This in turn increases the complexity of understanding behavior of the system, which is important when developing new software as […]

OpenCL

Oct, 30

GPU Computations in Heterogeneous Grid Environments

This thesis describes how the performance of job management systems on heterogeneous computing grids can be increased with Graphics Processing Units (GPU). The focus lies on describing what is required to extend the grid to support the Open Computing Language (OpenCL) and how an OpenCL application can be implemented for the heterogeneous grid. Additionally, already […]

OpenCL

Oct, 30

Dynamic Scheduling of Parallel Code for Heterogeneous Systems

A typical consumer desktop computer has a multi-core CPU with at least two and possibly up to eight processing elements over four processors, and a multi-core GPU with up to 512 processing elements. Both the CPU and the GPU are capable of running parallel code, and this project demonstrates a method for dynamically deciding whether […]

OpenCL

Oct, 30

Development and evaluation of a GPU-optimized N-body term for the simulation of biomolecules

Advancements in massively parallel sampling of the conformational space of biomolecules enables, for example, protein structure prediction, in-silico drug development and cell signaling. Despite the existence of highly distributed protein simulation architectures like POEM@HOME, there was no abundant computational resource both strong and serial strength and in parallel sampling. In this study we investigate the […]

OpenCL

Oct, 30

CPU and GPU Co-processing for Sound

When using voice communications, one of the problematic phenomena that can occur, is participants hearing an echo of their own voice. Acoustic echo cancellation (AEC) is used to remove this echo, but can be computationally demanding.The recent OpenCL standard allows high-level programs to be run on both multi-core CPUs, as well as Graphics Processing Units […]

OpenCL

Oct, 29

Jit4OpenCL: a compiler from Python to OpenCL

Heterogeneous computing platforms that use GPUs and CPUs in tandem for computation have become an important choice to build low-cost high-performance computing platforms. The computing ability of modern GPUs surpasses that of CPUs can offer for certain classes of applications. GPUs can deliver several Tera-Flops in peak performance. However, programmers must adopt a more complicated […]

OpenCL

Oct, 29

GPU based particle system

GPGPU (General purpose computing on graphics processing unit) is quite common in today’s modern computer games when doing heavy simulation calculations like game physics or particle systems. GPU programming is not only used in games but also in scientific research when doing heavy calculations on molecular structures and protein folding etc. The reason why you […]

OpenCL

Oct, 29

Using OpenCL for image analysis

This thesis investigates the suitability of OpenCL for acceleration of Image analysis operations from a developers perspective. To achieve this four representative problems: Morphological operations, Convolution, Watershedding and Markov random field-based texture segmentation are evaluated. The selected problems offers different implementation issues in terms of locality of the operations and load versus computation. The thesis […]

OpenCL

Oct, 29

Using GPUs to Accelerate Installed Antenna Performance Simulations

Savant is a asymptotic ray-tracing CEM tool used to predict the performance of antennas installed on electrically large platforms, including far-field antenna patterns, near-field distributions, and antenna-to-antenna coupling. Savant is based on the shooting and bouncing rays (SBR) formulation. While asymptotic solvers like Savant have significantly smaller computational and memory requirements for electrically large problems […]

CUDA

Oct, 29

An Exploration of OpenCL on Multiple Hardware Platforms for a Numerical Relativity Application

Currently there is considerable interest in making use of many-core processor architectures, such as Nvidia and AMD graphics processing units (GPUs) for scientific computing. In this work we explore the use of the Open Computing Language (OpenCL) for a typical Numerical Relativity application: a time-domain Teukolsky equation solver (a linear, hyperbolic, partial differential equation solver […]

OpenCL

high performance computing on graphics processing units: hgpu.org

Posts

Exploring Many-Core Design Templates for FPGAs and ASICs

Improving Energy Efficiency of GPU based General-Purpose Scientific Computing through Automated Selection of Near Optimal Configurations

Leveraging Binary Translation for Heterogeneous Profiling

GPU Computations in Heterogeneous Grid Environments

Dynamic Scheduling of Parallel Code for Heterogeneous Systems

Development and evaluation of a GPU-optimized N-body term for the simulation of biomolecules

CPU and GPU Co-processing for Sound

Jit4OpenCL: a compiler from Python to OpenCL

GPU based particle system

Using OpenCL for image analysis

Using GPUs to Accelerate Installed Antenna Performance Simulations

An Exploration of OpenCL on Multiple Hardware Platforms for a Numerical Relativity Application

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)