high performance computing on graphics processing units: hgpu.org

Posts

Dec, 19

Acceleration of grammatical evolution using graphics processing units: computational intelligence on consumer games and graphics hardware

Several papers show that symbolic regression is suitable for data analysis and prediction in financial markets. Grammatical Evolution (GE), a grammar-based form of Genetic Programming (GP), has been successfully applied in solving various tasks including symbolic regression. However, often the computational effort to calculate the fitness of a solution in GP can limit the area […]

CUDA

Dec, 19

Parallel programming with inductive synthesis

We show that program synthesis can generate GPU algorithms as well as their optimized implementations. Using the scan kernel as a case study, we describe our evolving synthesis techniques. Relying on our synthesizer, we can parallelize a serial problem by transforming it into a scan operation, synthesize a SIMD scan algorithm, and optimize it to […]

CUDA

Dec, 18

A Translation Framework for Executing the Sequential Binary Code on CPU/GPU Based Architectures

The method of using DBT (dynamic binary translation) to execute the source ISAs binary code on target platforms has been perplexed by low overhead for many years. GPU as a many-core processor has tremendous computational power. Employing GPU as a coprocessor to parallel execute the hot spot of binary code hold a great promise of […]

CUDA

Dec, 18

Automatic code generation and tuning for stencil kernels on modern shared memory architectures

In this paper, we present Patus, a code generation and auto-tuning framework for stencil computations targeted at multi- and manycore processors, such as multicore CPUs and graphics processing units. Patus, which stands for "Parallel Autotuned Stencils," generates a compute kernel from a specification of the stencil operation and a strategy which describes the parallelization and […]

CUDA

Dec, 18

GPU-based simulation of 3D blood flow in abdominal aorta using OpenFOAM

The simulation of blood flow in the cardiac system has the potential to become an attractive diagnostic tool for many cardiovascular diseases, such as in the case of aneurysm. This potential could be reached if the simulations were to be completed in hours rather than days and without resorting to the use of expensive supercomputers. […]

CUDA

Dec, 18

Collision for 75-step SHA-1: Intensive Parallelization with GPU

We present a brief report on the collision search for the reduced SHA-1. With a few improvements to our previous work, directed at efficient parallelization on a GPU cluster, we managed to construct a new collision for 75-step reduced SHA-1 hash function.

OpenCL

Dec, 18

Leveraging Parallelism with CUDA and OpenCL

Graphics processing units (GPUs), originally designed for computing and manipulating pixels, have become general-purpose processors capable of executing in excess of trillion calculations per second. Taking advantage of GPU’s compute power and commodity popularity, the field of computing systems is exhibiting a trend toward heterogeneous platforms consisting of a central processor integrated with graphics hardware. […]

CUDA

•

OpenCL

Dec, 18

Efficient Computational Methods for Uncertainty Quantification of Large Systems

The quest to design environment-friendly and sustainable engineering systems has witnessed more and more fervent efforts in recent years. With the growth of affordable large-capacity computing resources, predictive, science-based computational models have become instrumental in this pursuit. The present work develops efficient computational methods for the uncertainty analysis of large dynamical and mechanical systems with […]

CUDA

Dec, 18

Real-Time Implementation of a Full Hyperspectral Unmixing Chain on Graphics Processing Units

Hyperspectral unmixing is a very important task for remotely sensed hyperspectral data exploitation. It amounts at estimating the abundance of pure spectral signatures (called endmembers) in each mixed pixel of the original hyperspectral image, where mixed pixels arise due to insufficient spatial resolution and other phenomena. The full spectral unmixing chain comprises three main steps: […]

CUDA

Dec, 18

Efficient data structures for piecewise-smooth video processing

A number of useful image and video processing techniques, ranging from low level operations such as denoising and detail enhancement to higher level methods such as object manipulation and special effects, rely on piecewise-smooth functions computed from the input data. In this thesis, we present two computationally efficient data structures for representing piecewise-smooth visual information […]

Dec, 18

The MOSIX Cluster Operating System for High-Performance Computing on Linux Clusters, Multi-Clusters, GPU Clusters and Clouds

MOSIX is a cluster operating system targeted for HighPerformance Computing (HPC) on Linux platforms, including clusters, multi-clusters, GPU clusters and Clouds. The unique features of MOSIX provide users and applications with the impression of running on a single computer with multiple processors, without changing the interface and the run-time environment of their respective login nodes. […]

OpenCL

Dec, 18

Parallel paradigms in optimal structural design

Modern-day processors are not getting any faster. Due to the power consumption limit of frequency scaling, parallel processing is increasingly being used to decrease computation time. In this thesis, several parallel paradigms are used to improve the performance of commonly serial SAO programs. Four novelties are discussed: First, replacing double precision solvers with single precision […]

CUDA

•

OpenCL

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Acceleration of grammatical evolution using graphics processing units: computational intelligence on consumer games and graphics hardware

Parallel programming with inductive synthesis

A Translation Framework for Executing the Sequential Binary Code on CPU/GPU Based Architectures

Automatic code generation and tuning for stencil kernels on modern shared memory architectures

GPU-based simulation of 3D blood flow in abdominal aorta using OpenFOAM

Collision for 75-step SHA-1: Intensive Parallelization with GPU

Leveraging Parallelism with CUDA and OpenCL

Efficient Computational Methods for Uncertainty Quantification of Large Systems

Real-Time Implementation of a Full Hyperspectral Unmixing Chain on Graphics Processing Units

Efficient data structures for piecewise-smooth video processing

The MOSIX Cluster Operating System for High-Performance Computing on Linux Clusters, Multi-Clusters, GPU Clusters and Clouds

Parallel paradigms in optimal structural design

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)