high performance computing on graphics processing units: hgpu.org

hgpu.org » AI

Domain-Specific Code Language Models: Unraveling the Potential for HPC Codes and Tasks

Tal Kadosh, Niranjan Hasabnis, Vy A. Vo, Nadav Schneider, Neva Krien, Mihai Capota, Abdul Wasay, Nesreen Ahmed, Ted Willke, Guy Tamir, Yuval Pinter, Timothy Mattson, Gal Oren

View

Tags: AI, Code generation, Computer science, Heterogeneous systems, HPC, nVidia, nVidia A40, OpenMP, Package, Python

January 7, 2024 by hgpu

A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures

Fabrizio Ferrandi, Serena Curzel, Leandro Fiorin, Daniele Ielmini, Cristina Silvano, Francesco Conti, Alessio Burrello, Francesco Barchi, Luca Benini, Luciano Lavagno, Teodoro Urso, Enrico Calore, Sebastiano Fabio Schifano, Cristian Zambelli, Maurizio Palesi, Giuseppe Ascia, Enrico Russo, Nicola Petra, Davide De Caro, Gennaro Di Meo, Valeria Cardellini, Salvatore Filippone, Francesco Lo Presti, Francesco Silvestri, Paolo Palazzari, Stefania Perri

View

Tags: AI, Artificial intelligence, Computer science, CUDA, Deep learning, Design space exploration, Hardware Architecture, Heterogeneous systems, Machine learning, Neural networks, nVidia, nVidia H100, OpenCL, survey

December 3, 2023 by hgpu

Solving MaxSAT with Matrix Multiplication

David Warde-Farley, Vinod Nair, Yujia Li, Ivan Lobov, Felix Gimeno, Simon Osindero

View

Tags: AI, Artificial intelligence, Computer science, Matrix multiplication, Neural networks, TPU

November 12, 2023 by hgpu

Compressed Real Numbers for AI: a case-study using a RISC-V CPU

Federico Rossi, Marco Cococcioni, Roger Ferrer Ibàñez, Jesùs Labarta, Filippo Mantovani, Marc Casas, Emanuele Ruffaldi, Sergio Saponara

View

Tags: AI, Compression, Computer science, Machine learning, Neural networks

September 24, 2023 by hgpu

APACE: AlphaFold2 and advanced computing as a service for accelerated discovery in biophysics

Hyun Park, Parth Patel, Roland Haas, E. A. Huerta

View

Tags: AI, Algorithms, Biology, Biomolecules, Biophysics, nVidia, nVidia A100, nVidia A40, Package

August 20, 2023 by hgpu

Fast Knowledge Graph Completion using Graphics Processing Units

Chun-Hee Lee, Dong-oh Kang, Hwa Jeon Song

View

Tags: AI, Algorithms, Computer science, CUDA, Databases, Graph theory, nVidia, nVidia A100

July 30, 2023 by hgpu

Mystique: Enabling Accurate and Scalable Generation of Production AI Benchmarks

Mingyu Liang, Wenyin Fu, Louis Feng, Zhongyi Lin, Pavani Panakanti, Shengbao Zheng, Srinivas Sridharan, Christina Delimitrou

View

Tags: AI, Benchmarking, Code generation, Computer science, CUDA, nVidia, Package, Performance, PyTorch, Tesla A100, Tesla V100

July 16, 2023 by hgpu

Evaluation of OpenAI Codex for HPC Parallel Programming Models Kernel Generation

William F. Godoy, Pedro Valero-Lara, Keita Teranishi, Prasanna Balaprakash, Jeffrey S. Vetter

View

Tags: AI, Artificial intelligence, Benchmarking, Code generation, Computer science, CUDA, Fortran, HPC, Julia, nVidia, OpenACC, OpenMP, Package, Python

July 2, 2023 by hgpu

SciAI4Industry – Solving PDEs for industry-scale problems with deep learning

Philipp A. Witte, Russell J. Hewett, Kumar Saurabh, AmirHossein Sojoodi, Ranveer Chandra

View

Tags: AI, Cloud, Computational Physics, Computer science, Deep learning, Differential equations, Neural networks, nVidia, nVidia A100, nVidia DGX-A100, Partial differential equations, PDEs

November 27, 2022 by hgpu

User’s needs influencing HPC technologies

E. Athanasaki, N. Meyer, M. Cestari, A.Tuncer Durak, A. Tekin, P. Gschwandtner

View

Tags: AI, Artificial intelligence, Cloud, Computer science, CUDA, HIP, HPC, nVidia, OpenCL, Security, SYCL

May 29, 2022 by hgpu

SOL: Reducing the Maintenance Overhead for Integrating Hardware Support into AI Frameworks

Nicolas Weber

View

Tags: AI, Artificial intelligence, Computer science, CUDA, Machine learning, Neural networks, nVidia, nVidia GeForce RTX 2080, Performance

May 29, 2022 by hgpu

perf4sight: A toolflow to model CNN training performance on Edge GPUs

Aditya Rajagopal, Christos-Savvas Bouganis

View

Tags: AI, Computer science, CUDA, Neural networks, nVidia, nVidia Jetson TX2, Package, Performance, Tesla P40

August 22, 2021 by hgpu

CUDAnalyst (CUDA + Analyst)

Towards Feedback-to-Plan Decisions for Self-Evolving LLM Agents in CUDA Kernel Generation

CodegenBench

CodegenBench: Can LLMs Write Efficient Code Across Architectures?

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs

Agentic Code Optimization via Compiler-LLM Cooperation

Agentic Code Optimization via Compiler-LLM Cooperation

Device Virtual Machine (DVM)

DVM: Real-Time Kernel Generation for Dynamic AI Models

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

See all packages

* * *

* * *

HGPU group © 2010-2026 hgpu.org

All rights belong to the respective authors

Login | Sitemap | Feedback | Policy

Contact us: