hgpu.org » Local memory
Jianbin Fang, Henk Sips, Ana Lucia Varbanescu
Tags: ATI, ATI Radeon HD 7970, Computer science, Intel Xeon Phi, Local memory, nVidia, OpenCL, Performance, Portability, Tesla C1060, Tesla C2050, Tesla K20
July 29, 2014 by jfang
Jianbin Fang, Henk Sips, Pekka Jaaskelainen, Ana Lucia Varbanescu
Tags: ATI, ATI Radeon HD 7970, Intel Xeon Phi, Local memory, nVidia, OpenCL, Reverse Engineering, Tesla C2050, Tesla K20
June 16, 2014 by jfang
Recent source codes
* * *
Most viewed papers (last 30 days)
- Diagnosing FP4 inference: a layer-wise and block-wise sensitivity analysis of NVFP4 and MXFP4
- Architecture-Aware LLM Inference Optimization on AMD Instinct GPUs: A Comprehensive Benchmark and Deployment Study
- EvoScientist: Towards Multi-Agent Evolving AI Scientists for End-to-End Scientific Discovery
- LLMQ: Efficient Lower-Precision LLM Training for Consumer GPUs
- KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization
- AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search
- An Efficient Heterogeneous Co-Design for Fine-Tuning on a Single GPU
- MobileKernelBench: Can LLMs Write Efficient Kernels for Mobile Devices?
- DRTriton: Large-Scale Synthetic Data Reinforcement Learning for Triton Kernel Generation
- KernelFoundry: Hardware-aware evolutionary GPU kernel optimization
* * *




