high performance computing on graphics processing units: hgpu.org

hgpu.org » FPGA

Directive-Based, High-Level Programming and Optimizations for High-Performance Computing with FPGAs

Jacob Lambert, Seyong Lee, Jungwon Kim, Jeffrey S. Vetter, Allen D. Malony

View

Download (PDF)

Tags: Computer science, FPGA, nVidia, OpenACC, OpenCL, Tesla K40

July 1, 2018 by hgpu

RAPIDNN: In-Memory Deep Neural Network Acceleration Framework

Mohsen Imani, Mohammad Samragh, Yeseong Kim, Saransh Gupta, Farinaz Koushanfar, Tajana Rosing

View

Download (PDF)

Tags: AMD Radeon HD 7970, ATI, CNN, Computer science, Deep learning, FPGA, Hardware Architecture, Neural networks, OpenCL

June 20, 2018 by hgpu

Combining Multiple Optimised FPGA-based Pulsar Search Modules Using OpenCL

Haomiao Wang, Prabu Thiagaraj, Oliver Sinnen

View

Download (PDF)

Tags: Astrophysics, FPGA, OpenCL, Signal processing

June 17, 2018 by hgpu

Acceleration of k-Nearest Neighbor and SRAD Algorithms Using Intel FPGA SDK for OpenCL

Liyuan Liu

View

Download (PDF)

Tags: Algorithms, Computer science, FPGA, Machine learning, Nearest neighbour, OpenCL, Sorting, Thesis

June 17, 2018 by hgpu

Efficient Large-scale Approximate Nearest Neighbor Search on OpenCL FPGA

Jialiang Zhang, Soroosh Khoram, Jing Li

View

Download (PDF)

Tags: Computer science, FPGA, Nearest neighbour, nVidia, nVidia GeForce GTX Titan XP, OpenCL

June 13, 2018 by hgpu

A High-efficiency FPGA-based Accelerator for Convolutional Neural Networks using Winograd Algorithm

Y. Huang, J. Shen, Z. Wang, M. Wen, C. Zhang

View

Download (PDF)

Tags: Computational Complexity, Computer science, Computer vision, Deep learning, FPGA, HLS, Neural networks

June 9, 2018 by hgpu

Design of FPGA-Based Accelerator for Convolutional Neural Network under Heterogeneous Computing Framework with OpenCL

Li Luo, Yakun Wu, Fei Qiao, Yi Yang, Qi Wei, Xiaobo Zhou, Yongkai Fan, Shuzheng Xu, Xinjun Liu, Huazhong Yang

View

Download (PDF)

Tags: Algorithms, CNN, Computer science, Deep learning, FPGA, Heterogeneous systems, Neural networks, OpenCL

June 2, 2018 by hgpu

FPGA-based Acceleration of FT Convolution for Pulsar Search Using OpenCL

Haomiao Wang, Prabu Thiagaraj, Oliver Sinnen

View

Download (PDF)

Tags: AMD Radeon R7 370, Astrophysics, ATI, FPGA, OpenCL

June 2, 2018 by hgpu

Transformations of High-Level Synthesis Codes for High-Performance Computing

Johannes de Fine Licht, Simon Meierhans, Torsten Hoefler

View

Download (PDF)

Tags: Computer science, FPGA, HLS, OpenCL

May 26, 2018 by hgpu

Parallel Programming for FPGAs

Ryan Kastner, Janarbek Matai, Stephen Neuendorffer

View

Download (PDF)

Tags: Book, Computer science, FPGA, HLS

May 12, 2018 by hgpu

Efficient Hardware Acceleration on SoC-FPGA with OpenCL

Susmitha Gogineni

View

Download (PDF)

Tags: Computer science, Design space exploration, FPGA, Heterogeneous systems, HLS, OpenCL, Thesis

May 12, 2018 by hgpu

Tiramisu: A Code Optimization Framework for High Performance Systems

Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Patricia Suriana, Shoaib Kamil, Saman Amarasinghe

View

Download (PDF)

Tags: Algorithms, Code generation, Computer science, CUDA, Distributed computing, FPGA, Heterogeneous systems, Linear Algebra, LLVM, MPI, nVidia, OpenMPI, Tesla K40

May 5, 2018 by hgpu

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Directive-Based, High-Level Programming and Optimizations for High-Performance Computing with FPGAs

RAPIDNN: In-Memory Deep Neural Network Acceleration Framework

Combining Multiple Optimised FPGA-based Pulsar Search Modules Using OpenCL

Acceleration of k-Nearest Neighbor and SRAD Algorithms Using Intel FPGA SDK for OpenCL

Efficient Large-scale Approximate Nearest Neighbor Search on OpenCL FPGA

A High-efficiency FPGA-based Accelerator for Convolutional Neural Networks using Winograd Algorithm

Design of FPGA-Based Accelerator for Convolutional Neural Network under Heterogeneous Computing Framework with OpenCL

FPGA-based Acceleration of FT Convolution for Pulsar Search Using OpenCL

Transformations of High-Level Synthesis Codes for High-Performance Computing

Parallel Programming for FPGAs

Efficient Hardware Acceleration on SoC-FPGA with OpenCL

Tiramisu: A Code Optimization Framework for High Performance Systems

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)