high performance computing on graphics processing units: hgpu.org

Posts

Apr, 12

Model Coupling between the Weather Research and Forecasting Model and the DPRI Large Eddy Simulator for Urban Flows on GPU-accelerated Multicore Systems

In this report we present a novel approach to model coupling for shared-memory multicore systems hosting OpenCL-compliant accelerators, which we call The Glasgow Model Coupling Framework (GMCF). We discuss the implementation of a prototype of GMCF and its application to coupling the Weather Research and Forecasting Model and an OpenCL-accelerated version of the Large Eddy […]

OpenCL

Apr, 9

Automated GPU Kernel Transformations in Large-Scale Production Stencil Applications

This paper proposes an end-to-end framework for automatically transforming stencil-based CUDA programs to exploit inter-kernel data locality. The CUDA-to-CUDA transformation collectively replaces the user-written kernels by auto-generated kernels optimized for data reuse. The transformation is based on two basic operations, kernel fusion and fission, and relies on a series of automated steps: gathering metadata, generating […]

CUDA

Apr, 8

Finite element numerical integration for first order approximations on multi-core architectures

The paper presents investigations on the implementation and performance of the finite element numerical integration algorithm for first order approximations and three processor architectures, popular in scientific computing, classical CPU, Intel Xeon Phi and NVIDIA Kepler GPU. A unifying programming model and portable OpenCL implementation is considered for all architectures. Variations of the algorithm due […]

OpenCL

Apr, 8

GPU Accelerated Strong and Branching Bisimilarity Checking

Bisimilarity checking is an important operation to perform explicit-state model checking when the state space of a model under verification has already been generated. It can be applied in various ways: reduction of a state space w.r.t. a particular flavour of bisimilarity, or checking that two given state spaces are bisimilar. Bisimilarity checking is a […]

CUDA

Apr, 8

Enhancing Fluid Modeling with Turbulence and Acceleration

In this dissertation, we have proposed our solutions to four important and challenging topics in enhancing fluid modeling with turbulence and acceleration: distance field representation of obstacles in fluid, adaptive and controllable turbulence enhancement, Langevin Particles and GPU acceleration in fluid modeling. All these fields aims at creating realistic and fast fluid field which are […]

CUDA

Apr, 8

Benchmarking the cost of thread divergence in CUDA

All modern processors include a set of vector instructions. While this gives a tremendous boost to the performance, it requires a vectorized code that can take advantage of such instructions. As an ideal vectorization is hard to achieve in practice, one has to decide when different instructions may be applied to different elements of the […]

CUDA

Apr, 8

Early Experiences Running the 3D Stencil Jacobi Method in Intel Xeon Phi

Iterative stencil computations are important pattern of computations in different computational fields such as physics or chemistry simulations. A stencil computation repeatedly updates each point of a d-dimensional grid as a function of itself and its near neighbors. As the demand for more and more compute power is growing rapidly in different fields of research, […]

Apr, 8

3rd International Conference on Control, Robotics and Cybernetics (ICCRC), 2015

ICCRC 2015- 3rd International Conference on Control, Robotics and Cybernetics Berlin, Germany August 13-14, 2015 http://www.iccrc.org/ Submission Deadline: 2015-06-05 Publication: All accepted papers of ICCRC 2015 will be published by International Journal of Mechanical Engineering and Robotics Research (IJMERR) (ISSN:2278-0149), which will be indexed by Copernicus, ProQuest (USA), Open J-Gate, Indian Science and Google Scholar. […]

Apr, 8

8th International Conference on Advanced Computer Theory and Engineering (ICACTE), 2015

ICACTE 2015- 8th International Conference on Advanced Computer Theory and Engineering Berlin, Germany August 13-14, 2015 http://www.icacte.org/ Submission Deadline: 2015-06-05 Publication: Submitted papers can be selected and published into one of the following Journals. *WIT Transactions on Engineering Sciences (ISSN: 1743-3533) Indexed by EI Compendex and ISI * International Journal of Computer Theory and Engineering […]

Apr, 8

7th International Conference on Education Technology and Computer (ICETC), 2015

ICETC 2015- 7th International Conference on Education Technology and Computer Berlin, Germany August 13-14, 2015 http://www.icetc.org/ Submission Deadline: 2015-06-05 Publication: *International Journal of Information and Education Technology (IJIET)-ISSN: 2010-3689 Abstracting/ Indexing: EI (INSPEC, IET), Cabell’s Directories, DOAJ, Electronic Journals Library, Engineering & Technology Digital Library, EBSCO, Google Scholar, Crossref and ProQuest *Lecture Notes on Information […]

Apr, 7

clRNG: A Random Number API with Multiple Streams for OpenCL

We present clRNG, a library for uniform random number generation in OpenCL. Streams of random numbers act as virtual random number generators. They can be created on the host computer in unlimited numbers, and then used either on the host or on other computing devices by work items to generate random numbers. Each stream also […]

OpenCL

Apr, 7

State Lattice-based Motion Planning for Autonomous On-Road Driving

Since DARPA Urban Challenge 2007 (DUC), the development of autonomous vehicles has attracted increasing attention from both academic institutes and the automotive industry. It is believed that autonomous vehicles sophisticated and reliable enough would redefine mobility. The motion planner and sensor simulation presented in this thesis are intended to contribute to this prospect. The task […]

CUDA

•

OpenGL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Model Coupling between the Weather Research and Forecasting Model and the DPRI Large Eddy Simulator for Urban Flows on GPU-accelerated Multicore Systems

Automated GPU Kernel Transformations in Large-Scale Production Stencil Applications

Finite element numerical integration for first order approximations on multi-core architectures

GPU Accelerated Strong and Branching Bisimilarity Checking

Enhancing Fluid Modeling with Turbulence and Acceleration

Benchmarking the cost of thread divergence in CUDA

Early Experiences Running the 3D Stencil Jacobi Method in Intel Xeon Phi

3rd International Conference on Control, Robotics and Cybernetics (ICCRC), 2015

8th International Conference on Advanced Computer Theory and Engineering (ICACTE), 2015

7th International Conference on Education Technology and Computer (ICETC), 2015

clRNG: A Random Number API with Multiple Streams for OpenCL

State Lattice-based Motion Planning for Autonomous On-Road Driving

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)