hgpu.org » Apple M2 Pro
Dahua Feng, Zhiming Xu, Rongxiang Wang, Felix Xiaozhu Lin
Tags: AI, Apple M2 Max, Apple M2 Pro, Apple M2 Ultra, Computer science, CUDA, Linear Algebra, LLM, Machine learning, nVidia, nVidia GeForce RTX 4090, nVidia GeFroce RTX 2080 Ti, nVidia Quadro RTX 4000, nVidia RTX A6000, Performance, PyTorch
February 3, 2025 by hgpu
Recent source codes
* * *
Most viewed papers (last 30 days)
- MusaCoder: Native GPU Kernel Generation with Full-Stack Training on Moore Threads GPU
- CUDAHercules: Benchmarking Hardware-Aware Expert-level CUDA Optimization for LLMs
- KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels
- Pretraining large language models with MXFP4 on Native FP4 Hardware
- KForge: LLM-Driven Cross-Platform Kernel Generation for AI Accelerators
- Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study
- CUDABeaver: Benchmarking LLM-Based Automated CUDA Debugging
- Source-to-Source Transformations for GPU Code Generation
- Towards Feedback-to-Plan Decisions for Self-Evolving LLM Agents in CUDA Kernel Generation
- CodegenBench: Can LLMs Write Efficient Code Across Architectures?
* * *



