high performance computing on graphics processing units: hgpu.org

Posts

Sep, 29

Large-Scale High-Lundquist Number Reduced MHD Simulations of the Solar Corona Using GPU Accelerated Machines

We have recently carried out a computational campaign to investigate a model of coronal heating in three-dimensions using reduced magnetohydrodynamics (RMHD). Our code is built on a conventional scheme using the pseudo-spectral method, and is parallelized using MPI. The current investigation requires very long time integrations using high Lundquist numbers, where the formation of very […]

CUDA

Sep, 29

Possible planet-forming regions on submillimetre images

Submillimetre images of transition discs are expected to reflect the distribution of the optically thin dust. Former observation of three transition discs LkHa330, SR21N, and HD1353444B at submillimetre wavelengths revealed images which cannot be modelled by a simple axisymmetric disc. We show that a large-scale anticyclonic vortex that develops where the viscosity has a large […]

CUDA

Sep, 28

Highly Scalable Multi Objective Test Suite Minimisation Using Graphics Cards

Despite claims of "embarrassing parallelism" for many optimisation algorithms, there has been very little work on exploiting parallelism as a route for SBSE scalability. This is an important oversight because scalability is so often a critical success factor for Software Engineering work. This paper shows how relatively inexpensive General Purpose computing on Graphical Processing Units […]

OpenCL

Sep, 28

Design space exploration towards a realtime and energy-aware GPGPU-based analysis of biosensor data

In this paper, novel objectives for the design space exploration of GPGPU applications are presented. The design space exploration takes the combination of energy efficiency and realtime requirements into account. This is completely different to the commonest high performance computing objective, which is to accelerate an application as much as possible. As a proof-of-concept, a […]

OpenCL

Sep, 28

Global optimization model on power efficiency of GPU and multicore processing element for SIMD computing with CUDA

Estimating and analyzing the power consuming features of a program on a hardware platform is important for energy aware High Performance Computing (HPC) optimization, it can help to handle critical design constraints at the level of software, chose preferable algorithm in order to reach the best energy performance. Optimizing the power efficiency of CUDA program […]

CUDA

Sep, 28

2PARMA: Parallel Paradigms and Run-time Management Techniques for Many-Core Architectures

The 2PARMA project focuses on the development of parallel programming models and run-time resource management techniques to exploit the features of many-core processor architectures. The main goals of the 2PARMA project are: definition of a parallel programming model combining component-based and single-instruction multiple-thread approaches, instruction set virtualisation based on portable byte-code, run-time resource management policies […]

OpenCL

Sep, 28

Parallelizing fuzzy rule generation using GPGPU

This article proposes a method to parallelize the process of generating fuzzy if-then rules for pattern classification problems in order to reduce the computational time. The proposed method makes use of general purpose computation on graphics processing units (GPGPUs)’ parallel implementation with compute unified device architecture (CUDA), a development environment. CUDA contains a library to […]

CUDA

Sep, 28

Optimizing Linpack Benchmark on GPU-Accelerated Petascale Supercomputer

In this paper we present the programming of the Linpack benchmark on TianHe-1 system, the first petascale supercomputer system of China, and the largest GPU-accelerated heterogeneous system ever attempted before. A hybrid programming model consisting of MPI, OpenMP and streaming computing is described to explore the task parallel, thread parallel and data parallel of the […]

Sep, 28

XML3D: interactive 3D graphics for the web

Web technologies provide the basis to distribute digital information worldwide and in realtime but they have also established the Web as a ubiquitous application platform. The Web evolved from simple text data to include advanced layout, images, audio, and recently streaming video. Today, as our digital environment becomes increasingly three-dimensional (e.g. 3D cinema, 3D video, […]

OpenGL

Sep, 28

Spark: modular, composable shaders for graphics hardware

In creating complex real-time shaders, programmers should be able to decompose code into independent, localized modules of their choosing. Current real-time shading languages, however, enforce a fixed decomposition into per-pipeline-stage procedures. Program concerns at other scales — including those that cross-cut multiple pipeline stages — cannot be expressed as reusable modules. We present a shading […]

Sep, 28

VoxelPipe: a programmable pipeline for 3D voxelization

We present a highly exible and efficient software pipeline for programmable triangle voxelization. The pipeline, entirely written in CUDA, supports both fully conservative and thin voxelizations, multiple boolean, floating point, vector-typed render targets, user-defined vertex and fragment shaders, and a bucketing mode which can be used to generate 3D A-buffers containing the entire list of […]

CUDA

Sep, 28

Thread Block Compaction for Efficient SIMT Control Flow

Manycore accelerators such as graphics processor units (GPUs) organize processing units into single-instruction, multiple data "cores" to improve throughput per unit hardware cost. Programming models for these accelerators encourage applications to run kernels with large groups of parallel scalar threads. The hardware groups these threads into warps/wavefronts and executes them in lockstep-dubbed single-instruction, multiple-thread (SIMT) […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Large-Scale High-Lundquist Number Reduced MHD Simulations of the Solar Corona Using GPU Accelerated Machines

Possible planet-forming regions on submillimetre images

Highly Scalable Multi Objective Test Suite Minimisation Using Graphics Cards

Design space exploration towards a realtime and energy-aware GPGPU-based analysis of biosensor data

Global optimization model on power efficiency of GPU and multicore processing element for SIMD computing with CUDA

2PARMA: Parallel Paradigms and Run-time Management Techniques for Many-Core Architectures

Parallelizing fuzzy rule generation using GPGPU

Optimizing Linpack Benchmark on GPU-Accelerated Petascale Supercomputer

XML3D: interactive 3D graphics for the web

Spark: modular, composable shaders for graphics hardware

VoxelPipe: a programmable pipeline for 3D voxelization

Thread Block Compaction for Efficient SIMT Control Flow

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)