high performance computing on graphics processing units: hgpu.org

hgpu.org » Analytical model

Efficient heterogeneous matrix profile on a CPU + High Performance FPGA with integrated HBM

Jose Carlos Romero, Angeles Navarro, Antonio Vilches, Andrés Rodríguez, Francisco Corbera, Rafael Asenjo

View

Download (PDF)

Tags: Algorithms, Analytical model, Computer science, FPGA, Heterogeneous systems, OpenCL

June 27, 2021 by hgpu

Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures

Peng Zhang, Jianbin Fang, Canqun Yang, Chun Huang, Tao Tang, Zheng Wang

View

Download (PDF)

Source codes

Tags: Analytical model, Computer science, CUDA, Heterogeneous systems, Intel Xeon Phi, Machine learning, nVidia, nVidia GeForce GTX 1080 Ti, OpenCL, OpenMP, Package

March 22, 2020 by hgpu

On the Representation of Partially Specified Implementations and its Application to the Optimization of Linear Algebra Kernels on GPU

Ulysse Beaugnon, Basile Clément, Nicolas Tollenaere, Albert Cohen

View

Download (PDF)

Tags: Analytical model, Computer science, CUDA, Linear Algebra, nVidia, nVidia Quadro K4000, Programming Languages

April 14, 2019 by hgpu

Sparse Winograd Convolutional neural networks on small-scale systolic arrays

Feng Shi, Haochen Li, Yuhe Gao, Benjamin Kuschner, Song-Chun Zhu

View

Download (PDF)

Tags: Analytical model, Computer science, Deep learning, FPGA, Neural networks

October 13, 2018 by hgpu

A Comparison of GPU Execution Time Prediction using Machine Learning and Analytical Modeling

Marcos Amaris, Raphael Y. de Camargo, Mohamed Dyab, Alfredo Goldman, Denis Trystram

View

Download (PDF)

Source codes

Tags: Analytical model, Benchmarking, Computer science, Heterogeneous systems, Machine learning, nVidia, nVidia GeForce GTX 680, nVidia GeForce GTX 970, nVidia GeForce GTX 980, nVidia GeForce GTX Titan, nVidia GeForce GTX Titan Black, nVidia GeForce GTX Titan X, nVidia Quadro K5200, Package, Performance, Tesla K20, Tesla K40

December 6, 2016 by hgpu

Bridging the Semantic Gaps of GPU Acceleration for Scaleout CNN-based Big Data Processing: Think Big, See Small

Mingcong Song, Yang Hu, Yunlong Xu, Chao Li, Huixiang Chen, Jingling Yuan, Tao Li

View

Download (PDF)

Tags: Analytical model, big data, Computer science, CUDA, Deep learning, Distributed computing, Neural networks, nVidia, Tesla K20

September 22, 2016 by hgpu

MODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous Architectures

Tobias Gysi, Tobias Grosser, Torsten Hoefler

View

Download (PDF)

Source codes

Tags: Analytical model, CUDA, nVidia, Optimization, Package, Stencil computation, Tesla K20

June 3, 2016 by gysit

GPL: A GPU-based Pipelined Query Processing Engine

Johns Paul, Jiong He, Bingsheng He

View

Download (PDF)

Tags: Analytical model, Computer science, Databases, nVidia, OpenCL

April 24, 2016 by hgpu

Efficient Simulation Techniques for Large-Scale Applications

Jen-Cheng Huang

View

Download (PDF)

Tags: Analytical model, Computer science, CUDA, nVidia, nVidia Quadro 6000, Performance, Thesis

September 26, 2015 by hgpu

Throughput-Oriented Analytical Models for Performance Estimation on Programmable Hardware Accelerators

Junjie Lai

View

Download (PDF)

Tags: Analytical model, Benchmarking, Computer science, CUDA, nVidia, nVidia GeForce GTX 580, nVidia GeForce GTX 680, Thesis

August 28, 2013 by hgpu

Studying the core-cusp problem in cold dark matter halos using N-body simulations on GPU clusters

Go Ogiya, Masao Mori, Yohei Miki, Taisuke Boku, Naohito Nakasato

View

Download (PDF)

Tags: Analytical model, Astrophysics, Cosmology, CUDA, GPU cluster, Gravitation, N-body simulation, nVidia, Physics, Tesla M2090

August 17, 2013 by hgpu

The Yin and Yang of Processing Data Warehousing Queries on GPU Devices

Yuan Yuan, Rubao Lee, Xiaodong Zhang

View

Download (PDF)

Source codes

Tags: Analytical model, ATI, ATI Radeon HD 7970, Computer science, CUDA, Databases, nVidia, nVidia GeForce GTX 480, nVidia GeForce GTX 580, nVidia GeForce GTX 680, OpenCL, Package

August 14, 2013 by hgpu

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

ParEval-Repo: A Benchmark Suite for Evaluating LLMs with Repository-level HPC Translation Tasks

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

Libra: Synergizing CUDA and Tensor Cores for High-Performance Sparse Matrix Multiplication

exa-AMD: Exascale Accelerated Materials Discovery

Accelerated discovery and design of Fe-Co-Zr magnets with tunable magnetic anisotropy through machine learning and parallel computing

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

No More Shading Languages: Compiling C++ to Vulkan Shaders

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Efficient heterogeneous matrix profile on a CPU + High Performance FPGA with integrated HBM

Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures

On the Representation of Partially Specified Implementations and its Application to the Optimization of Linear Algebra Kernels on GPU

Sparse Winograd Convolutional neural networks on small-scale systolic arrays

A Comparison of GPU Execution Time Prediction using Machine Learning and Analytical Modeling

Bridging the Semantic Gaps of GPU Acceleration for Scaleout CNN-based Big Data Processing: Think Big, See Small

MODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous Architectures

GPL: A GPU-based Pipelined Query Processing Engine

Efficient Simulation Techniques for Large-Scale Applications

Throughput-Oriented Analytical Models for Performance Estimation on Programmable Hardware Accelerators

Studying the core-cusp problem in cold dark matter halos using N-body simulations on GPU clusters

The Yin and Yang of Processing Data Warehousing Queries on GPU Devices

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)