high performance computing on graphics processing units: hgpu.org

Posts

Nov, 20

A GPU-based framework for efficient image processing

This thesis tries to answer how to design a framework for image processing on the GPU, supporting the common environments OpenGL GLSL, OpenCL and CUDA. An generalized view of GPU image processing is presented. The framework is called gpuip and is implemented in C++ but also wrapped with Python-bindings. The framework is cross-platform and works […]

CUDA

•

OpenCL

•

OpenGL

Nov, 20

Using CUDA architecture for computer simulations of thermomechanical phenomena

This paper presents a simulation of the casting solidification process performed on graphics processors compatible with nVidia CUDA architecture. Indispensable for the parallel implementation of a computer simulation of the solidification process, it was necessary to modify the numerical model. The new approach shown in this paper allows the process of matrix building to be […]

CUDA

Nov, 20

Automatic Performance Tuning of Pipeline Patterns for Heterogeneous Parallel Architectures

Heterogeneous parallel architectures combining conventional multicore CPUs with GPUs and other types of accelerators promise significant performance gains compared to homogeneous systems. However, exploiting the full potential of such systems is becoming more and more challenging often forcing programmers to combine different programming models and parallelization strategies. A promising approach to coping with the increased […]

OpenCL

Nov, 20

CL2QCD – Lattice QCD based on OpenCL

We present the Lattice QCD application CL2QCD, which is based on OpenCL and can be utilized to run on Graphic Processing Units as well as on common CPUs. We focus on implementation details as well as performance results of selected features. CL2QCD has been successfully applied in LQCD studies at finite temperature and density and […]

OpenCL

Nov, 20

Using Graphics Processing Units to solve the classical N-body problem in physics and astrophysics

Graphics Processing Units (GPUs) can speed up the numerical solution of various problems in astrophysics including the dynamical evolution of stellar systems; the performance gain can be more than a factor 100 compared to using a Central Processing Unit only. In this work I describe some strategies to speed up the classical N-body problem using […]

CUDA

•

OpenCL

Nov, 20

International Conference on Engineering Mathematics and Physics, ICEMP 2015

Publication: Submitted papers can be selected and published into one of the following Journals: Advanced Materials Research (ISSN: 1022-6680) Indexed by Elsevier: SCOPUS and Ei Compendex (CPX), Cambridge Scientific Abstracts (CSA), Chemical Abstracts (CA), Google and Google Scholar, ISI (ISTP, CPCI, Web of Science), Institution of Electrical Engineers (IEE), etc. International Journal of Applied Physics […]

Nov, 20

OPNET: An Integrated Design Paradigm for Simulations

In recent years, a lot of progress has been made in the field of networks and communications; and also in design of simulators. In this paper, we survey and review prominent fields where OPNET has been applied and compare it with other existing simulators. Our work helps beginners and researchers alike in estimating the useful […]

Nov, 20

A Study of Successive Over-relaxation Method Parallelization Over Modern HPC Languages

Successive over-relaxation (SOR) is a computationally intensive, yet extremely important iterative solver for solving linear systems. Due to recent trends of exponential growth in the amount of data generated and increasing problem sizes, serial platforms have proved to be insufficient in providing the required computational power. In this paper, we present parallel implementations of red-black […]

Nov, 19

FPGA: An Efficient And Promising Platform For Real-Time Image Processing Applications

Digital image processing(DIP) is an ever growing area with a variety of applications including medicine, video surveillance, and many more. To implement the upcoming sophisticated DIP algorithms and to process the large amount of data captured from sources such as satellites or medical instruments, intelligent high speed real-time systems have become imperative. Image processing algorithms […]

Nov, 18

Conjugate gradient solvers on Intel Xeon Phi and NVIDIA GPUs

Lattice Quantum Chromodynamics simulations typically spend most of the runtime in inversions of the Fermion Matrix. This part is therefore frequently optimized for various HPC architectures. Here we compare the performance of the Intel Xeon Phi to current Kepler-based NVIDIA Tesla GPUs running a conjugate gradient solver. By exposing more parallelism to the accelerator through […]

CUDA

Nov, 18

Hybrid CPU-GPU Pipeline Framework

The pipeline pattern for parallel programs is utilized in a wide array of scientific applications designed for execution on hybrid CPU-GPU architectures. However, there is a dearth of tools and libraries to support implementation of pipeline parallelism for hybrid architectures. We present the Hybrid Pipeline Framework (HyPi) that is intended to fill this gap. HyPi […]

CUDA

Nov, 18

Processing Hard Sphere Collisions on a GPU Using OpenCL

Physically accurate hard sphere collisions are inherently sequential as the order in which collisions occur can have a significant impact on the resulting system. This makes processing hard sphere collisions on parallel hardware challenging. We present an approach to solving this problem that can be implemented using OpenCL that runs on current hardware. This approach […]

OpenCL

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

A GPU-based framework for efficient image processing

Using CUDA architecture for computer simulations of thermomechanical phenomena

Automatic Performance Tuning of Pipeline Patterns for Heterogeneous Parallel Architectures

CL2QCD – Lattice QCD based on OpenCL

Using Graphics Processing Units to solve the classical N-body problem in physics and astrophysics

International Conference on Engineering Mathematics and Physics, ICEMP 2015

OPNET: An Integrated Design Paradigm for Simulations

A Study of Successive Over-relaxation Method Parallelization Over Modern HPC Languages

FPGA: An Efficient And Promising Platform For Real-Time Image Processing Applications

Conjugate gradient solvers on Intel Xeon Phi and NVIDIA GPUs

Hybrid CPU-GPU Pipeline Framework

Processing Hard Sphere Collisions on a GPU Using OpenCL

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)