hgpu.org » Apple M2 Max
Dahua Feng, Zhiming Xu, Rongxiang Wang, Felix Xiaozhu Lin
Tags: AI, Apple M2 Max, Apple M2 Pro, Apple M2 Ultra, Computer science, CUDA, Linear Algebra, LLM, Machine learning, nVidia, nVidia GeForce RTX 4090, nVidia GeFroce RTX 2080 Ti, nVidia Quadro RTX 4000, nVidia RTX A6000, Performance, PyTorch
February 3, 2025 by hgpu
Recent source codes
* * *
Most viewed papers (last 30 days)
- Omniwise: Predicting GPU Kernels Performance with LLMs
- P4OMP: Retrieval-Augmented Prompting for OpenMP Parallelism in Serial Code
- Engineering Supercomputing Platforms for Biomolecular Applications
- CUDA-LLM: LLMs Can Write Efficient CUDA Kernels
- GCStack+GCScaler: Fast and Accurate GPU Performance Analyses Using Fine-Grained Stall Cycle Accounting and Interval Analysis
- A First Look at Bugs in LLM Inference Engines
- ParEval-Repo: A Benchmark Suite for Evaluating LLMs with Repository-level HPC Translation Tasks
- Efficient GPU Implementation of Multi-Precision Integer Division
- Accelerated discovery and design of Fe-Co-Zr magnets with tunable magnetic anisotropy through machine learning and parallel computing
- chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations
* * *