hgpu.org » Local memory
Jianbin Fang, Henk Sips, Ana Lucia Varbanescu
Tags: ATI, ATI Radeon HD 7970, Computer science, Intel Xeon Phi, Local memory, nVidia, OpenCL, Performance, Portability, Tesla C1060, Tesla C2050, Tesla K20
July 29, 2014 by jfang
Jianbin Fang, Henk Sips, Pekka Jaaskelainen, Ana Lucia Varbanescu
Tags: ATI, ATI Radeon HD 7970, Intel Xeon Phi, Local memory, nVidia, OpenCL, Reverse Engineering, Tesla C2050, Tesla K20
June 16, 2014 by jfang
Recent source codes
RepoLaunch: Automating Build and Test Pipeline of Code Repositories on ANY Language and ANY Platform
RepoLaunch: Automating Build and Test Pipeline of Code Repositories on ANY Language and ANY Platform
* * *
Most viewed papers (last 30 days)
- DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels
- Deep Kernel Fusion for Transformers
- Improving HPC Code Generation Capability of LLMs via Online Reinforcement Learning with Real-Machine Benchmark Rewards
- StitchCUDA: An Automated Multi-Agents End-to-End GPU Programing Framework with Rubric-based Agentic Reinforcement Learning
- Execution-Centric Characterization of FP8 Matrix Cores, Asynchronous Execution, and Structured Sparsity on AMD MI300A
- Catalyst-Agent: Autonomous heterogeneous catalyst screening and optimization with an LLM Agent
- A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5
- CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
- Joint Training on AMD and NVIDIA GPUs
- Fine-Tuning GPT-5 for GPU Kernel Generation
* * *




