high performance computing on graphics processing units: hgpu.org

hgpu.org » Embedded high-performance computing

Real-Time High-Performance Computing for Embedded Control Systems

Alejandro Josué Calderón Torres

View

Tags: Code generation, Computer science, Embedded high-performance computing, nVidia, nVidia Jetson AGX Xavier, nVidia Jetson Nano, nVidia Jetson TX2, OpenCL, Tesla T4, Tesla V100, Thesis

August 7, 2022 by hgpu

CitiusSynapse: A Deep Learning Framework for Embedded Systems

Seungtae Hong, Hyunwoo Cho, Jeong-Si Kim

View

Download (PDF)

Tags: Computer science, Deep learning, Embedded high-performance computing, OpenCL

December 12, 2021 by hgpu

A Co-Design Framework with OpenCL Support for Low-Energy Wide SIMD Processor

Dongrui She, Yifan He, Luc Waeijen, Henk Corporaal

View

Download (PDF)

Tags: Code generation, Compilers, Computer science, Embedded high-performance computing, OpenCL

September 24, 2015 by hgpu

Viability of Feature Detection on Sony Xperia Z3 using OpenCL

Max Danielsson, Thomas Sievert

View

Download (PDF)

Source codes

Tags: Android, Computer science, Computer vision, Embedded high-performance computing, nVidia, nVidia GeForce GTX 660, OpenCL, Package, Thesis

August 24, 2015 by hgpu

Experiences in Speeding Up Computer Vision Applications on Mobile Computing Platforms

Luna Backes, Alejandro Rico, Bjorn Franke

View

Download (PDF)

Tags: ARM, Computer science, Computer vision, Embedded high-performance computing, OpenCL

July 28, 2015 by hgpu

A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems

Sparsh Mittal

View

Download (PDF)

Tags: Embedded high-performance computing, Energy-efficient computing, FPGA, GPU, Power-efficient computing

January 7, 2015 by sparsh0mittal

Automated Software Testing of Memory Performance in Embedded GPUs

Sudipta Chattopadhyay, Petru Eles, Zebo Peng

View

Download (PDF)

Tags: Computer science, CUDA, Embedded high-performance computing, GPGPU-sim, Memory, nVidia, Performance

September 19, 2014 by hgpu

Pattern Matching in OpenCL: GPU vs CPU Energy Consumption on Two Mobile Chipsets

Elena Aragon, Juan M. Jimenez, Arian Maghazeh, Jim Rasmusson, Unmesh D. Bordoloi

View

Download (PDF)

Tags: Algorithms, ARM, Computer science, Embedded high-performance computing, OpenCL, Pattern Search

September 11, 2014 by hgpu

An OpenCL Runtime and Scheduler for Embedded Multicore DSP Parallel Systems

Li Tian, Fugen Zhou, Cai Meng

View

Download (PDF)

Tags: Computer science, DSP, Embedded high-performance computing, nVidia, OpenCL, Task scheduling

May 18, 2014 by hgpu

Accelerating Java on Embedded GPU

Iype P. Joseph

View

Download (PDF)

Tags: Computer science, Embedded high-performance computing, Java, OpenCL, Thesis

March 15, 2014 by hgpu

High-Performance Energy-Efficient Multicore Embedded Computing

Arslan Munir, Sanjay Ranka, Ann Gordon-Ross

View

Download (PDF)

Tags: Computer science, Embedded high-performance computing, Energy-efficient computing, Review

April 6, 2012 by hgpu

Evaluation of an accelerator architecture for Speckle Reducing Anisotropic Diffusion

Siddharth Nilakantan, Srikanth Annangi, Nikhil Gulati, Karthik Sangaiah, Mark Hempstead

View

Download (PDF)

Tags: Algorithms, Computer science, CUDA, Embedded high-performance computing, nVidia, nVidia GeForce 8800 GTX, OpenMP, Performance, Ultrasound

November 12, 2011 by hgpu

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Real-Time High-Performance Computing for Embedded Control Systems

CitiusSynapse: A Deep Learning Framework for Embedded Systems

A Co-Design Framework with OpenCL Support for Low-Energy Wide SIMD Processor

Viability of Feature Detection on Sony Xperia Z3 using OpenCL

Experiences in Speeding Up Computer Vision Applications on Mobile Computing Platforms

A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems

Automated Software Testing of Memory Performance in Embedded GPUs

Pattern Matching in OpenCL: GPU vs CPU Energy Consumption on Two Mobile Chipsets

An OpenCL Runtime and Scheduler for Embedded Multicore DSP Parallel Systems

Accelerating Java on Embedded GPU

High-Performance Energy-Efficient Multicore Embedded Computing

Evaluation of an accelerator architecture for Speckle Reducing Anisotropic Diffusion

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)