high performance computing on graphics processing units: hgpu.org

hgpu.org » Neural networks

Mass Estimation from Images using Deep Neural Network and Sparse Ground Truth

Muhammad K.A. Hamdan, Diane Rover, John Just

View

Download (PDF)

Source codes

Tags: Computer science, Deep learning, Machine learning, Neural networks, Package

August 18, 2019 by hgpu

A Deep Learning Approach for Automatic Code Optimization in the Tiramisu Compiler

Mohammed Henni, Ilhem Isra Mekki

View

Download (PDF)

Tags: Code generation, Computer science, CUDA, Deep learning, Machine learning, Neural networks, nVidia, nVidia DGX-1

August 11, 2019 by hgpu

Incremental Bounded Model Checking of Artificial Neural Networks in CUDA

Luiz H. Sena, Iury V. Bessa, Mikhail R. Gadelha, Lucas C. Cordeiro, Edjard Mota

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Neural networks, nVidia, Package

August 5, 2019 by hgpu

A Survey of Convolutional Neural Networks on Edge with Reconfigurable Computing

Mario P. Vestias

View

Download (PDF)

Tags: CNN, Computer science, Deep learning, FPGA, Neural networks

August 5, 2019 by hgpu

A Power Efficient Neural Network Implementation on Heterogeneous FPGA and GPU Devices

Yuexuan Tu, Saad Sadiq, Yudong Tao, Mei-Ling Shyu, Shu-Ching Chen

View

Download (PDF)

Tags: Computer science, Deep learning, Energy-efficient computing, FPGA, Neural networks, nVidia, nVidia Jetson TX2

July 28, 2019 by hgpu

Benchmarking TPU, GPU, and CPU Platforms for Deep Learning

Yu (Emma)Wang, Gu-Yeon Wei, David Brooks

View

Download (PDF)

Tags: Benchmarking, Cloud, Computer science, CUDA, Deep learning, Machine learning, Neural networks, nVidia, Performance, Tesla V100, TPU

July 28, 2019 by hgpu

MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing

Daniel Nichols, Nathalie-Sofia Tomov, Frank Betancourt, Stanimire Tomov, Kwai Wong, Jack Dongarra

View

Download (PDF)

Source codes

Tags: Algorithms, Computer science, CUDA, Heterogeneous systems, HPC, Linear Algebra, Machine learning, Neural networks, nVidia, nVidia GeForce GTX 1050Ti, Package

July 26, 2019 by hgpu

GRN: Gated Relation Network to Enhance Convolutional Neural Network for Named Entity Recognition

Hui Chen, Zijia Lin, Guiguang Ding, Jianguang Lou, Yusen Zhang, Borje Karlsson

View

Download (PDF)

Tags: Computer science, Neural networks, NLP, nVidia, Tesla P100

July 21, 2019 by hgpu

Profiling based Out-of-core Hybrid Method for Large Neural Networks

Yuki Ito, Haruki Imai, Tung Le Duc, Yasushi Negishi, Kiyokuni Kawachiya, Ryo Matsumiya, Toshio Endo

View

Download (PDF)

Tags: Computer science, CUDA, Deep learning, Machine learning, Neural networks, nVidia, Performance, Tesla V100

July 14, 2019 by hgpu

PANNA: Properties from Artificial Neural Network Architectures

Ruggero Lot, Franco Pellegrini, Yusuf Shaidu, Emine Kucukbenli

View

Download (PDF)

Source codes

Tags: Deep learning, Machine learning, Molecular dynamics, Neural networks, Package, Physics, TensorFlow

July 10, 2019 by hgpu

Novel Methodologies for Predictable CPU-To-GPU Command Offloading

Roberto Cavicchioli, Nicola Capodieci, Marco Solieri, Marko Bertogna

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Heterogeneous systems, Neural networks, nVidia, Package, Vulkan

July 7, 2019 by hgpu

A Unified Optimization Approach for CNN Model Inference on Integrated GPUs

Leyuan Wang, Zhi Chen, Yizhi Liu, Yao Wang, Lianmin Zheng, Mu Li, Yida Wang

View

Download (PDF)

Source codes

Tags: ARM, CNN, Compilers, Computer science, Deep learning, Machine learning, Neural networks, nVidia, nVidia Jetson Nano, OpenCL, Package

July 7, 2019 by hgpu

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs

Agentic Code Optimization via Compiler-LLM Cooperation

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Mass Estimation from Images using Deep Neural Network and Sparse Ground Truth

A Deep Learning Approach for Automatic Code Optimization in the Tiramisu Compiler

Incremental Bounded Model Checking of Artificial Neural Networks in CUDA

A Survey of Convolutional Neural Networks on Edge with Reconfigurable Computing

A Power Efficient Neural Network Implementation on Heterogeneous FPGA and GPU Devices

MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing

GRN: Gated Relation Network to Enhance Convolutional Neural Network for Named Entity Recognition

Profiling based Out-of-core Hybrid Method for Large Neural Networks

PANNA: Properties from Artificial Neural Network Architectures

Novel Methodologies for Predictable CPU-To-GPU Command Offloading

A Unified Optimization Approach for CNN Model Inference on Integrated GPUs

Recent source codes

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

Agentic Code Optimization via Compiler-LLM Cooperation

Most viewed papers (last 30 days)