high performance computing on graphics processing units: hgpu.org

Posts

Jun, 26

Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation

Convolutional Neural Networks (CNNs) can be shifted across 2D images or 3D videos to segment them. They have a fixed input size and typically perceive only small local contexts of the pixels to be classified as foreground or background. In contrast, Multi-Dimensional Recurrent NNs (MD-RNNs) can perceive the entire spatio-temporal context of each pixel in […]

CUDA

Jun, 26

Concurrent Solutions to Linear Systems using Hybrid CPU/GPU Nodes

We investigate the parallel solutions to linear systems with the application focus as the global illumination problem in computer graphics. An existing CPU serial implementation using the radiosity method is given as the performance baseline where a scene and corresponding form-factor coefficients are provided. The initial computational radiosity solver uses the basic Jacobi method with […]

CUDA

•

OpenGL

Jun, 26

Composability of parallel codes on heterogeneous architectures

To face the ever demanding requirements in term of accuracy and speed of scientific simulations, the High Performance community is constantly increasing the demands in term of parallelism, adding thus tremendous value to parallel libraries strongly optimized for highly complex architectures.Enabling HPC applications to perform efficiently when invoking multiple parallel libraries simultaneously is a great […]

CUDA

Jun, 26

Block Time Step Storage Scheme for Astrophysical N-body Simulations

Astrophysical research in recent decades has made significant progress thanks to the availability of various N-body simulation techniques. With the rapid development of high-performance computing technologies, modern simulations have been able to take the computing power of massively parallel clusters with more than 10^5 GPU cores. While unprecedented accuracy and dynamical scales have been achieved, […]

CUDA

Jun, 26

Ebb: A DSL for Physical Simluation on CPUs and GPUs

Designing programming environments for physical simulation is challenging because simulations rely on diverse algorithms and geometric domains. These challenges are compounded when we try to run efficiently on heterogeneous parallel architectures. We present Ebb, a domain-specific language (DSL) for simulation, that runs efficiently on both CPUs and GPUs. Unlike previous DSLs, Ebb uses a three-layer […]

CUDA

•

OpenGL

Jun, 24

Toward a Multi-level Parallel Framework on GPU Cluster with PetSC-CUDA for PDE-based Optical Flow Computation

In this work we present a multi-level parallel framework for the Optical Flow computation on a GPUs cluster, equipped with a scientific computing middleware (the PetSc library). Starting from a flow-driven isotropic method, which models the optical flow problem through a parabolic partial differential equation (PDE), we have designed a parallel algorithm and its software […]

CUDA

Jun, 24

Alpha-Beta Divergences Discover Micro and Macro Structures in Data

Although recent work in non-linear dimensionality reduction investigates multiple choices of divergence measure during optimization (Yang et al., 2013; Bunte et al., 2012), little work discusses the direct effects that divergence measures have on visualization. We study this relationship, theoretically and through an empirical analysis over 10 datasets. Our works shows how the alpha and […]

CUDA

Jun, 24

Toward GPU Accelerated Data Stream Processing

In recent years, the need for continuous processing and analysis of data streams has increased rapidly. To achieve high throughput-rates, stream-applications make use of operator-parallelization, batching-strategies and distribution. Another possibility is to utilize co-processors capabilities per operator. Further, the database community noticed, that a column-oriented architecture is essential for efficient co-processing, since the data transfer […]

OpenCL

Jun, 24

DCT-JPEG Image Coding Based on GPU

In this paper, the parallel algorithm of JPEG coding based on GPU is proposed, most image compression systems have efficiency problem and the real-time of wireless multimedia sensor networks (WMSN) which used in image compression and transmission is also an issue need to be solved, so in this paper parallel computation is used in JPEG […]

CUDA

Jun, 24

GPU-Friendly Local Regression for Voice Conversion

Voice conversion is the task of transforming a source speaker’s voice so that it sounds like a target speaker’s voice. We present a GPUfriendly local regression model for voice conversion that is capable of converting speech in real-time and achieves state-of-the-art accuracy on this task. Our model uses a new approximation for computing local regression […]

CUDA

Jun, 23

4rd International Conference on Network and Computing Technology (ICNCT), 2015

Topics: • Optical Communications and Networking • Wireless Communications and Networking • Multimedia Networking • Signal Processing for Communications • Networking Algorithms and Performance Evaluation • Wireless Sensor Networks • Communication and Information Theory • Network Security • Cognitive Radio Networks • Internet Applications • Protocols and Algorithms • Coding Theory • 3G & 4G […]

Jun, 23

2nd International Conference on Knowledge and Software Engineering (ICKSE), 2015

Topics: Agent architectures, ontologies, languages and protocols Multi-agent systems Agent-based learning and knowledge discovery Interface agents Agent-based auctions and marketplaces Artificial life and societies Secure mobile and multi-agent systems Mobile agents Mobile Commerce Technology and Application Systems Component-Based Software Engineering Automated Software Specification Automated Software Design and Synthesis Computer-Supported Cooperative Work Embedded and Ubiquitous Software […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation

Concurrent Solutions to Linear Systems using Hybrid CPU/GPU Nodes

Composability of parallel codes on heterogeneous architectures

Block Time Step Storage Scheme for Astrophysical N-body Simulations

Ebb: A DSL for Physical Simluation on CPUs and GPUs

Toward a Multi-level Parallel Framework on GPU Cluster with PetSC-CUDA for PDE-based Optical Flow Computation

Alpha-Beta Divergences Discover Micro and Macro Structures in Data

Toward GPU Accelerated Data Stream Processing

DCT-JPEG Image Coding Based on GPU

GPU-Friendly Local Regression for Voice Conversion

4rd International Conference on Network and Computing Technology (ICNCT), 2015

2nd International Conference on Knowledge and Software Engineering (ICKSE), 2015

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)