hgpu.org » Tela K40
Ammar Ahmad Awan, Hari Subramoni, Dhabaleswar K. Panda
Tags: Benchmarking, Caffe, Computer science, CUBLAS, CUDA, Deep learning, Intel Xeon Phi, Machine learning, nVidia, Tela K40, Tesla K80, Tesla P100
December 24, 2017 by hgpu
Recent source codes
* * *
Most viewed papers (last 30 days)
- Over-synchronization in GPU Programs
- PyOMP: Parallel programming for CPUs and GPUs with OpenMP and Python
- LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators
- A Distributed-memory Tridiagonal Solver Based on a Specialised Data Structure Optimised for CPU and GPU Architectures
- SoK: A Systems Perspective on Compound AI Threats and Countermeasures
- Profile Util library: A quick and easy way to get MPI, OpenMP and GPU runtime information
- On a Simplified Approach to Achieve Parallel Performance and Portability Across CPU and GPU Architectures
- Context Parallelism for Scalable Million-Token Inference
- NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference
- Edify 3D: Scalable High-Quality 3D Asset Generation
* * *