hgpu.org » Apple M2 Max
Dahua Feng, Zhiming Xu, Rongxiang Wang, Felix Xiaozhu Lin
Tags: AI, Apple M2 Max, Apple M2 Pro, Apple M2 Ultra, Computer science, CUDA, Linear Algebra, LLM, Machine learning, nVidia, nVidia GeForce RTX 4090, nVidia GeFroce RTX 2080 Ti, nVidia Quadro RTX 4000, nVidia RTX A6000, Performance, PyTorch
February 3, 2025 by hgpu
Recent source codes
* * *
Most viewed papers (last 30 days)
- Data-efficient LLM Fine-tuning for Code Generation
- LithOS: An Operating System for Efficient Machine Learning on GPUs
- Large Language Model Powered C-to-CUDA Code Translation: A Novel Auto-Parallelization Framework
- MSCCL++: Rethinking GPU Communication Abstractions for Cutting-edge AI Applications
- GigaAPI for GPU Parallelization
- Scalability Evaluation of HPC Multi-GPU Training for ECG-based LLMs
- A Power-Efficient Scheduling Approach in a Cpu-Gpu Computing System by Thread-Based Parallel Programming
- DeepCompile: A Compiler-Driven Approach to Optimizing Distributed Deep Learning Training
- InteropUnityCUDA: A Tool for Interoperability Between Unity and CUDA
- GPU-centric Communication Schemes for HPC and ML Applications
* * *