hgpu.org » Apple M2 Pro
Dahua Feng, Zhiming Xu, Rongxiang Wang, Felix Xiaozhu Lin
Tags: AI, Apple M2 Max, Apple M2 Pro, Apple M2 Ultra, Computer science, CUDA, Linear Algebra, LLM, Machine learning, nVidia, nVidia GeForce RTX 4090, nVidia GeFroce RTX 2080 Ti, nVidia Quadro RTX 4000, nVidia RTX A6000, Performance, PyTorch
February 3, 2025 by hgpu
Recent source codes
* * *
Most viewed papers (last 30 days)
- Block: Balancing Load in LLM Serving with Context, Knowledge and Predictive Scheduling
- Luthier: Bridging Auto-Tuning and Vendor Libraries for Efficient Deep Learning Inference
- The Fused Kernel Library: A C++ API to Develop Highly-Efficient GPU Libraries
- GPUHammer: Rowhammer Attacks on GPU Memories are Practical
- Bandicoot: A Templated C++ Library for GPU Linear Algebra
- Towards Efficient and Practical GPU Multitasking in the Era of LLM
- Dissecting CPU-GPU Unified Physical Memory on AMD MI300A APUs
- Accelerating a Linear Programming Algorithm on AMD GPUs
- GPU-acceleration of the Discontinuous Galerkin Shallow Water Equations Solver (DG-SWEM) using CUDA and OpenACC
- CrossTL: A Universal Programming Language Translator with Unified Intermediate Representation
* * *