high performance computing on graphics processing units: hgpu.org

Posts

Oct, 19

AeminiumGPU: A CPU-GPU Hybrid Runtime for the Aeminium Language

Given that CPU clock speeds are stagnating, programmers are resorting to parallelism to improve the performance of their applications. Although such parallelism has usually been attained using either multicore architectures, multiple CPUs and/or clusters of machines, the GPU has since been used as an alternative. GPUs are an interesting resource because they can provide much […]

OpenCL

Oct, 19

The GPU Computing Revolution: From Multi-Core CPUs To Many-Core Graphics Processors

Computer architectures are undergoing their most radical change in a decade. In the past, processor performance has been improved largely by increasing clock speed: the faster the clock speed, the faster a processor can execute instructions, and thus the greater the performance that is delivered to the end user. This drive to greater and greater […]

CUDA

•

OpenCL

Oct, 19

GPU Parallel Collections For Scala

A decade ago, graphics processing units have been used specifically for high-speed graphics. Of late, they are becoming more popular as general purpose parallel processors. With the release of CUDA, ATI Stream and OpenCL, programmers can now split their program execution between CPU and GPU, whenever appropriate, resulting in huge performance gain. The cost of […]

CUDA

•

OpenCL

Oct, 18

Enabling Traceability in an MDE Approach to Improve Performance of GPU Applications

Graphics Processor Units (GPUs) are known for offering high performance and power efficiency for processing algorithms that suit well to theirmassively parallel architecture. Unfortunately, as parallel programming for thiskind of architecture requires a complex distribution of tasks and data, developersfind it difficult to implement their applications effectively. Although approachesbased on source-to-source and model-to-source transformations have […]

OpenCL

Oct, 18

Characterization of FPGA-based High Performance Computers

As CPU clock frequencies plateau and the doubling of CPU cores per processor exacerbate the memory wall, hybrid core computing, utilizing CPUs augmented with FPGAs and/or GPUs holds the promise of addressing high-performance computing demands, particularly with respect to performance, power and productivity. While traditional approaches to benchmark high-performance computers such as SPEC, took an […]

CUDA

•

OpenCL

Oct, 18

Design and Performance of the OP2 Library for Unstructured Mesh Applications

OP2 is an "active" library framework for the solution of unstructured mesh applications. It aims to decouple the scientific specification of an application from its parallel implementation to achieve code longevity and near-optimal performance by re-targeting the back-end to different multi-core/many-core hardware. This paper presents the design of the OP2 code generation and compiler framework […]

CUDA

•

OpenCL

Oct, 18

Radio astronomy beam forming on GPUs

In order to build the radio telescopes needed for the experiments planned for the years to come, it will be necessary to design computers capable of performing thousands more floating point operations per second than the actual most powerful computers of today, and do it in a very power efficient way. In this work we […]

CUDA

•

OpenCL

Oct, 18

Real-Time Spherical Panorama Image Stitching Using OpenCL

This paper presents a webcam-based spherical coordinate conversion system using OpenCL massive parallel computing for panorama video image stitching. With multi-core architecture and its high-bandwidth data transmission rate of memory accesses, modern programmable GPU makes it possible to process multiple video images in parallel for real-time interaction. To get a panorama view of 360 degrees, […]

OpenCL

•

OpenGL

Oct, 18

Experience Applying Fortran GPU Compilers to Numerical Weather Prediction

Graphics Processing Units (GPUs) have enabled significant improvements in computational performance compared to traditional CPUs in several application domains. Until recently, GPUs have been programmed using C/C++ based methods such as CUDA (NVIDIA) and OpenCL (NVIDIA and AMD). Using these approaches, Fortran Numerical Weather Prediction (NWP) codes would have to be completely re-written to take […]

CUDA

Oct, 18

Evaluating the Performance and Portability of OpenCL

Recent developments in processor architecture have settled a shift from sequential processing to parallel processing. This shift was not based on a breakthrough in processor design, but was actually an alternative design trajectory to avoid the limits that were reached on single core development. Along with the shift towards parallel architectures, a gap arose between […]

CUDA

•

OpenCL

Oct, 18

Odeint – Solving ordinary differential equations in C++

Many physical, biological or chemical systems are modeled by ordinary differential equations (ODEs) and finding their solution is an every-day-task for many scientists. Here, we introduce a new C++ library dedicated to find numerical solutions of initial value problems of ODEs: odeint (www.odeint.com). odeint is implemented in a highly generic way and provides extensive interoperability […]

CUDA

Oct, 18

Optimization strategies for parallel CPU and GPU implementations of a meshfree particle method

Much of the current focus in high performance computing (HPC) for computational fluid dynamics (CFD) deals with grid based methods. However, parallel implementations for new meshfree particle methods such as Smoothed Particle Hydrodynamics (SPH) are less studied. In this work, we present optimizations for both central processing unit (CPU) and graphics processing unit (GPU) of […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

AeminiumGPU: A CPU-GPU Hybrid Runtime for the Aeminium Language

The GPU Computing Revolution: From Multi-Core CPUs To Many-Core Graphics Processors

GPU Parallel Collections For Scala

Enabling Traceability in an MDE Approach to Improve Performance of GPU Applications

Characterization of FPGA-based High Performance Computers

Design and Performance of the OP2 Library for Unstructured Mesh Applications

Radio astronomy beam forming on GPUs

Real-Time Spherical Panorama Image Stitching Using OpenCL

Experience Applying Fortran GPU Compilers to Numerical Weather Prediction

Evaluating the Performance and Portability of OpenCL

Odeint – Solving ordinary differential equations in C++

Optimization strategies for parallel CPU and GPU implementations of a meshfree particle method

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)