hgpu.org » Tela K40
Ammar Ahmad Awan, Hari Subramoni, Dhabaleswar K. Panda
Tags: Benchmarking, Caffe, Computer science, CUBLAS, CUDA, Deep learning, Intel Xeon Phi, Machine learning, nVidia, Tela K40, Tesla K80, Tesla P100
December 24, 2017 by hgpu
Recent source codes
* * *
Most viewed papers (last 30 days)
- High-Performance Computing: from Optimization to Automation
- Accelerating cosmological simulations on GPUs: a portable approach using OpenMP
- Compiler and Runtime Systems for Generative AI Models
- EvoEngineer: Mastering Automated CUDA Kernel Code Evolution with Large Language Models
- Scalable GPU-Based Integrity Verification for Large Machine Learning Models
- ConCuR: Conciseness Makes State-of-the-Art Kernel Generation
- STARK: Strategic Team of Agents for Refining Kernels
- Tutoring LLM into a Better CUDA Optimizer
- INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats
- Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs
* * *



