hgpu.org » Intel Gaudi-2
Yunjae Lee, Juntaek Lim, Jehyeon Bang, Eunyeong Cho, Huijong Jeong, Taesu Kim, Hyungjun Kim, Joonhyung Lee, Jinseop Im, Ranggi Hwang, Se Jung Kwon, Dongsoo Lee, Minsoo Rhu
Tags: AI, Benchmarking, Computer science, CUDA, Intel, Intel Gaudi-2, nVidia, nVidia A100, Performance
January 6, 2025 by hgpu
Recent source codes
* * *
Most viewed papers (last 30 days)
- Diagnosing FP4 inference: a layer-wise and block-wise sensitivity analysis of NVFP4 and MXFP4
- Architecture-Aware LLM Inference Optimization on AMD Instinct GPUs: A Comprehensive Benchmark and Deployment Study
- EvoScientist: Towards Multi-Agent Evolving AI Scientists for End-to-End Scientific Discovery
- LLMQ: Efficient Lower-Precision LLM Training for Consumer GPUs
- KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization
- AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search
- An Efficient Heterogeneous Co-Design for Fine-Tuning on a Single GPU
- MobileKernelBench: Can LLMs Write Efficient Kernels for Mobile Devices?
- DRTriton: Large-Scale Synthetic Data Reinforcement Learning for Triton Kernel Generation
- KernelFoundry: Hardware-aware evolutionary GPU kernel optimization
* * *



