high performance computing on graphics processing units: hgpu.org

Posts

Apr, 9

Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures

Multi-core architectures comprising several GPUs have become mainstream in the field of High-Performance Computing. However, obtaining the maximum performance of such heterogeneous machines is challenging as it requires to carefully offload computations and manage data movements between the different processing units. The most promising and successful approaches so far rely on task-based runtimes that abstract […]

CUDA

Apr, 9

3D Hydrodynamic Simulation of Classical Nova Explosions

The purpose of this project is to develop a computer model to investigate the formation and life cycle of classical novae. A nova is an orbiting system consisting of a white dwarf and star. Over time, the white dwarf pulls hydrogen gas from the star which gathers onto the surface of the white dwarf (the […]

OpenCL

Apr, 9

Task-based FMM for heterogeneous architectures

High performance FMM is crucial for the numerical simulation of many physical problems. In a previous study, we have shown that task-based FMM provides the flexibility required to process a wide spectrum of particle distributions efficiently on multicore architectures. In this paper, we now show how such an approach can be extended to fully exploit […]

CUDA

Apr, 9

A two-level task scheduler on Multiple DSP system for OpenCL

This paper addresses the problem that multiple DSP system doesn’t support OpenCL programming. With the compiler, runtime and the kernel scheduler proposed, an OpenCL application becomes portable not only between multiple CPU and GPU, but also between embedded multiple DSP systems. Firstly, the LLVM compiler was imported for source-to-source translation in which the translated source […]

OpenCL

Apr, 9

Bayesian Sparse Unsupervised Learning for Probit Models of Binary Data

We present a unified approach to unsupervised Bayesian learning of factor models for binary data with binary and spike-and-slab latent factors. We introduce a non-negative constraint in the spike-and-slab prior that eliminates the usual sign ambiguity present in factor models and lowers the generalization error on the datasets tested here. For the generative models we […]

CUDA

Apr, 9

Reducing the Disk IO Bandwidth Bottleneck through Fast Floating Point Compression using Accelerators

Compute-intensive tasks in high-end high performance computing (HPC) systems often generate large amounts of data, especially floating-point data, that need to be transmitted over the network. Although computation speeds are very high, the overall performance of these applications is affected by the data transfer overhead. Moreover, as data sets are growing in size rapidly, bandwidth […]

CUDA

Apr, 9

Literature Review: Parallel Computing on linear equations of linear elastic FEM stimulation with CUDA

Scientific computation is the field of study that uses computers to implement mathematical models of physical phenomena such as FEM in deformation measurement in virtual reality. Scientific and engineering problems that would be almost impossible to solve by hand whereas on a computer, it can be handled properly. A numerical algorithm calculating for different fields […]

CUDA

Apr, 9

Exploring the power of GPU’s for training Deep Belief Networks

One of the major research trends currently is the evolution of heterogeneous parallel computing. GP-GPU computing is being widely used and several applications have been designed to exploit the massive parallelism that GP-GPU’s have to offer. While GPU’s have always been widely used in areas of computer vision for image processing, little has been done […]

CUDA

Apr, 7

A New Non-Blocking Approach on GPU Dynamical Memory Management

Dynamic memory allocation is a very important and basic technique implemented on modern computer architecture. In the massively parallel processor (MPP) architecture such as Graphics Processing Units (GPUs), many threads try to send allocation or deallocation requests to system in the same time, which could cause the issue of synchronization or race condition. In this […]

CUDA

Apr, 7

A New Digital Repository for Hyperspectral Imagery with Unmixing-Based Retrieval Functionality Implemented on GPUs

Over the last few years, hyperspectral image data have been collected for a large number of locations over the world, using a variety of instruments for Earth observation. In addition, several new hyperspectral missions will become operational in the near future. Despite the increasing availability and large volume of hyperspectral data in many applications, there […]

CUDA

Apr, 7

State of the Art Report on Real-time Rendering with Hardware Tessellation

For a long time, GPUs have primarily been optimized to render more and more triangles with increasingly flexible shading. However, scene data itself has typically been generated on the CPU and then uploaded to GPU memory. Therefore, widely used techniques that generate geometry at render time on demand for the rendering of smooth and displaced […]

Apr, 7

Detection of a faint fast-moving near-Earth asteroid using synthetic tracking technique

We report a detection of a faint near-Earth asteroid (NEA), which was done using our synthetic tracking technique and the CHIMERA instrument on the Palomar 200-inch telescope. This asteroid, with apparent magnitude of 23, was moving at 5.97 degrees per day and was detected at a signal-to-noise ratio (SNR) of 15 using 30 sec of […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures

3D Hydrodynamic Simulation of Classical Nova Explosions

Task-based FMM for heterogeneous architectures

A two-level task scheduler on Multiple DSP system for OpenCL

Bayesian Sparse Unsupervised Learning for Probit Models of Binary Data

Reducing the Disk IO Bandwidth Bottleneck through Fast Floating Point Compression using Accelerators

Literature Review: Parallel Computing on linear equations of linear elastic FEM stimulation with CUDA

Exploring the power of GPU’s for training Deep Belief Networks

A New Non-Blocking Approach on GPU Dynamical Memory Management

A New Digital Repository for Hyperspectral Imagery with Unmixing-Based Retrieval Functionality Implemented on GPUs

State of the Art Report on Real-time Rendering with Hardware Tessellation

Detection of a faint fast-moving near-Earth asteroid using synthetic tracking technique

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)