high performance computing on graphics processing units: hgpu.org

hgpu.org » Embedded high-performance computing

Real-Time High-Performance Computing for Embedded Control Systems

Alejandro Josué Calderón Torres

View

Tags: Code generation, Computer science, Embedded high-performance computing, nVidia, nVidia Jetson AGX Xavier, nVidia Jetson Nano, nVidia Jetson TX2, OpenCL, Tesla T4, Tesla V100, Thesis

August 7, 2022 by hgpu

CitiusSynapse: A Deep Learning Framework for Embedded Systems

Seungtae Hong, Hyunwoo Cho, Jeong-Si Kim

View

Download (PDF)

Tags: Computer science, Deep learning, Embedded high-performance computing, OpenCL

December 12, 2021 by hgpu

A Co-Design Framework with OpenCL Support for Low-Energy Wide SIMD Processor

Dongrui She, Yifan He, Luc Waeijen, Henk Corporaal

View

Download (PDF)

Tags: Code generation, Compilers, Computer science, Embedded high-performance computing, OpenCL

September 24, 2015 by hgpu

Viability of Feature Detection on Sony Xperia Z3 using OpenCL

Max Danielsson, Thomas Sievert

View

Download (PDF)

Source codes

Tags: Android, Computer science, Computer vision, Embedded high-performance computing, nVidia, nVidia GeForce GTX 660, OpenCL, Package, Thesis

August 24, 2015 by hgpu

Experiences in Speeding Up Computer Vision Applications on Mobile Computing Platforms

Luna Backes, Alejandro Rico, Bjorn Franke

View

Download (PDF)

Tags: ARM, Computer science, Computer vision, Embedded high-performance computing, OpenCL

July 28, 2015 by hgpu

A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems

Sparsh Mittal

View

Download (PDF)

Tags: Embedded high-performance computing, Energy-efficient computing, FPGA, GPU, Power-efficient computing

January 7, 2015 by sparsh0mittal

Automated Software Testing of Memory Performance in Embedded GPUs

Sudipta Chattopadhyay, Petru Eles, Zebo Peng

View

Download (PDF)

Tags: Computer science, CUDA, Embedded high-performance computing, GPGPU-sim, Memory, nVidia, Performance

September 19, 2014 by hgpu

Pattern Matching in OpenCL: GPU vs CPU Energy Consumption on Two Mobile Chipsets

Elena Aragon, Juan M. Jimenez, Arian Maghazeh, Jim Rasmusson, Unmesh D. Bordoloi

View

Download (PDF)

Tags: Algorithms, ARM, Computer science, Embedded high-performance computing, OpenCL, Pattern Search

September 11, 2014 by hgpu

An OpenCL Runtime and Scheduler for Embedded Multicore DSP Parallel Systems

Li Tian, Fugen Zhou, Cai Meng

View

Download (PDF)

Tags: Computer science, DSP, Embedded high-performance computing, nVidia, OpenCL, Task scheduling

May 18, 2014 by hgpu

Accelerating Java on Embedded GPU

Iype P. Joseph

View

Download (PDF)

Tags: Computer science, Embedded high-performance computing, Java, OpenCL, Thesis

March 15, 2014 by hgpu

High-Performance Energy-Efficient Multicore Embedded Computing

Arslan Munir, Sanjay Ranka, Ann Gordon-Ross

View

Download (PDF)

Tags: Computer science, Embedded high-performance computing, Energy-efficient computing, Review

April 6, 2012 by hgpu

Evaluation of an accelerator architecture for Speckle Reducing Anisotropic Diffusion

Siddharth Nilakantan, Srikanth Annangi, Nikhil Gulati, Karthik Sangaiah, Mark Hempstead

View

Download (PDF)

Tags: Algorithms, Computer science, CUDA, Embedded high-performance computing, nVidia, nVidia GeForce 8800 GTX, OpenMP, Performance, Ultrasound

November 12, 2011 by hgpu

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

SimSYCL: A SYCL Implementation Targeting Development, Debugging, Simulation and Conformance

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

94% on CIFAR-10 in 3.29 Seconds on a Single GPU

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers

OpenMC Monte Carlo Code

Performance Portable Monte Carlo Particle Transport on Intel, NVIDIA, and AMD GPUs

Polygeist: C/C++ frontend for MLIR

Retargeting and Respecializing GPU Workloads for Performance Portability

Parallel Gaussian process with kernel approximation in CUDA

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Real-Time High-Performance Computing for Embedded Control Systems

CitiusSynapse: A Deep Learning Framework for Embedded Systems

A Co-Design Framework with OpenCL Support for Low-Energy Wide SIMD Processor

Viability of Feature Detection on Sony Xperia Z3 using OpenCL

Experiences in Speeding Up Computer Vision Applications on Mobile Computing Platforms

A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems

Automated Software Testing of Memory Performance in Embedded GPUs

Pattern Matching in OpenCL: GPU vs CPU Energy Consumption on Two Mobile Chipsets

An OpenCL Runtime and Scheduler for Embedded Multicore DSP Parallel Systems

Accelerating Java on Embedded GPU

High-Performance Energy-Efficient Multicore Embedded Computing

Evaluation of an accelerator architecture for Speckle Reducing Anisotropic Diffusion

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)