high performance computing on graphics processing units: hgpu.org

Posts

Jan, 7

A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems

Recent technological advances have greatly improved the performance and features of embedded systems. With the number of just mobile devices now reaching nearly equal to the population of earth, embedded systems have truly become ubiquitous. These trends, however, have also made the task of managing their power consumption extremely challenging. In recent years, several techniques […]

Jan, 5

A New Sparse Matrix Vector Multiplication GPU Algorithm Designed for Finite Element Problems

Recently, graphics processors (GPUs) have been increasingly leveraged in a variety of scientific computing applications. However, architectural differences between CPUs and GPUs necessitate the development of algorithms that take advantage of GPU hardware. As sparse matrix vector multiplication (SPMV) operations are commonly used in finite element analysis, a new SPMV algorithm and several variations are […]

CUDA

Jan, 5

Subdivision Surface Evaluation as Sparse Matrix-Vector Multiplication

We present an interpretation of subdivision surface evaluation in the language of linear algebra. Specifically, the vector of surface points can be computed by left-multiplying the vector of control points by a sparse subdivision matrix. This "matrix-driven" interpretation applies to any level of subdivision, holds for many common subdivision schemes (including Catmull-Clark and Loop), supports […]

CUDA

•

OpenCL

Jan, 5

Hierarchical DAG Scheduling for Hybrid Distributed Systems

Accelerator-enhanced computing platforms have drawn a lot of attention due to their massive peak com-putational capacity. Despite significant advances in the pro-gramming interfaces to such hybrid architectures, traditional programming paradigms struggle mapping the resulting multi-dimensional heterogeneity and the expression of algorithm parallelism, resulting in sub-optimal effective performance. Task-based programming paradigms have the capability to alleviate […]

CUDA

Jan, 5

Methods and Metrics for Fair Server Assessment under Real-Time Financial Workloads

Energy efficiency has been a daunting challenge for datacenters. The financial industry operates some of the largest datacenters in the world. With increasing energy costs and the financial services sector growth, emerging financial analytics workloads may incur extremely high operational costs, to meet their latency targets. Microservers have recently emerged as an alternative to high-end […]

Jan, 5

GPU: Power vs Performance

GPUs are widely being used to meet the ever increasing demands of High performance computing. High-end GPUs are one of the highest consumers of power in a computer. Power dissipation has always been a major concern area for computer architects. Due to power efficiency demands modern CPUs have moved towards multicore architectures. GPUs are already […]

CUDA

Jan, 2

Real-Time Incompressible Fluid Simulation on the GPU

We present a parallel framework for simulating incompressible fluids with predictive-corrective incompressible Smoothed Particle Hydrodynamics (PCISPH) on the GPU in real time. To this end, we propose an efficient GPU streaming pipeline to map the entire computational task onto the GPU, fully exploiting the massive computational power of state-of-the-art GPUs. In PCISPH-based simulations, neighbor search […]

CUDA

Jan, 2

Customization of OpenCL Applications for Efficient Task Mapping under Heterogeneous Platform Constraints

When targeting an OpenCL application to platforms with multiple heterogeneous accelerators, task tuning and mapping have to cope with device-specific constraints. To address this problem, we present an innovative design flow for the customization and performance optimization of OpenCL applications on heterogeneous parallel platforms. It consists of two phases: 1) a tuning phase that optimizes […]

OpenCL

Jan, 2

Performance comparison of Lattice Boltzmann fluid flow simulation using OpenCL and CUDA frameworks

This paper presents performance comparison, of the lid-driven cavity flow simulation, with Lattice Boltzmann method, example, between CUDA and OpenCL parallel programming frameworks. CUDA is parallel programming model developed by NVIDIA for leveraging computing capabilities of their products. OpenCL is an open, royalty free, standard developed by Khronos group for parallel programming of heterogeneous devices […]

CUDA

•

OpenCL

Jan, 2

GPU-based acceleration of free energy calculations in solid state physics

Obtaining a thermodynamically accurate phase diagram through numerical calculations is a computationally expensive problem that is crucially important to understanding the complex phenomena of solid state physics, such as superconductivity. In this work we show how this type of analysis can be significantly accelerated through the use of modern GPUs. We illustrate this with a […]

CUDA

Jan, 2

Disjunctive Normal Networks

Artificial neural networks are powerful pattern classifiers; however, they have been surpassed in accuracy by methods such as support vector machines and random forests that are also easier to use and faster to train. Backpropagation, which is used to train artificial neural networks, suffers from the herd effect problem which leads to long training times […]

CUDA

Dec, 30

Characterization of OpenCL on a Scalable FPGA Architecture

The recent release of Altera’s SDK for OpenCL has greatly eased the development of FPGA-based systems. Research have shown performance improvements brought by OpenCL using a single FPGA device. However, to meet the objectives of high performance computing, OpenCL needs to be evaluated using multiple FPGAs. This work has proposed a scalable FPGA architecture for […]

OpenCL

high performance computing on graphics processing units: hgpu.org

Posts

A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems

A New Sparse Matrix Vector Multiplication GPU Algorithm Designed for Finite Element Problems

Subdivision Surface Evaluation as Sparse Matrix-Vector Multiplication

Hierarchical DAG Scheduling for Hybrid Distributed Systems

Methods and Metrics for Fair Server Assessment under Real-Time Financial Workloads

GPU: Power vs Performance

Real-Time Incompressible Fluid Simulation on the GPU

Customization of OpenCL Applications for Efficient Task Mapping under Heterogeneous Platform Constraints

Performance comparison of Lattice Boltzmann fluid flow simulation using OpenCL and CUDA frameworks

GPU-based acceleration of free energy calculations in solid state physics

Disjunctive Normal Networks

Characterization of OpenCL on a Scalable FPGA Architecture

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)