Programming

hgpu.org » Programming

Speeding up lattice sieve with Xeon Phi coprocessor

Anja Becker, Dusan Kostic

View

Download (PDF)

Tags: Algorithms, Computer science, Intel Xeon Phi, Security

July 2, 2017 by hgpu

Synthesis of Embedded Software using Dataflow Schedule Graphs

Abhay Raina

View

Download (PDF)

Tags: Computer science, DSP, nVidia, nVidia GeForce GTX 680, OpenCL, Signal processing, Thesis

July 2, 2017 by hgpu

Deep neural networks for direct, featureless learning through observation: the case of 2d spin models

K. Mills, I. Tamblyn

View

Download (PDF)

Tags: Condensed matter, CUDA, Deep learning, Ising model, Materials Science, Neural networks, nVidia, Phase transition, Physics, Tesla K40

July 2, 2017 by hgpu

DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications

Loc N. Huynh, Youngki Lee, Rajesh Krishna Balan

View

Download (PDF)

Source codes

Tags: Algorithms, Computer science, Deep learning, OpenCL, Package, Vulkan

June 25, 2017 by hgpu

Scalar collapse in AdS with an OpenCL open source code

Steven L. Liebling, Gaurav Khanna

View

Download (PDF)

Source codes

Tags: AMD Radeon R9 295X2, ATI, General Relativity and Quantum Cosmology, Heterogeneous systems, High Energy Physics - Theory, nVidia, OpenCL, Package, Physics, Tesla K40

June 25, 2017 by hgpu

ART vs. NDK vs. GPU acceleration: A study of performance of image processing algorithms on Android

Andreas Palsson

View

Download (PDF)

Tags: Algorithms, Android, Image processing, Java, OpenCL, RenderScript, Thesis

June 25, 2017 by hgpu

High-Performance Out-of-core Block Randomized Singular Value Decomposition on GPU

Yuechao Lu, Fumihiko Ino, Yasuyuki Matsushita

View

Download (PDF)

Tags: Algorithms, Computer science, Computer vision, CUDA, Linear Algebra, Machine learning, nVidia, Tesla P100

June 25, 2017 by hgpu

On the Use of a GPU-Accelerated Mobile Device Processor for Sound Source Localization

Jose A. Belloch, Jose M. Badia, Francisco D. Igual, Maximo Cobos, Enrique S. Quintana-Orti

View

Download (PDF)

Tags: Algorithms, ARM, nVidia, nVidia GeForce GTX 1080, OpenCL, Signal processing, Tesla K20

June 21, 2017 by hgpu

Panda: A Compiler Framework for Concurrent CPU-GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers

Mohammed Sourouri, Scott B. Baden, Xing Cai

View

Download (PDF)

Tags: Code generation, Compilers, Computer science, CUDA, GPU cluster, Heterogeneous systems, MPI, nVidia, OpenMP, Tesla K20

June 21, 2017 by hgpu

Rgtsvm: Support Vector Machines on a GPU in R

Zhong Wang, Tinyi Chu, Lauren A Choate, Charles G Danko

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Machine learning, nVidia, Package, R, Tesla K20

June 21, 2017 by hgpu

Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras

Keunwoo Choi, Deokjin Joo, Juho Kim

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Deep learning, Keras, Neural networks, nVidia, nVidia GeForce GTX 1080, nVidia GeForce GTX Titan X, Package, Python, Signal processing, Tesla K80, Tesla M60

June 21, 2017 by hgpu

Efficient OpenCL-based concurrent tasks offloading on accelerators

A.J. Lazaro-Munoz, J.M. Gonzalez-Linares, J. Gomez-Luna, N. Guil

View

Download (PDF)

Tags: AMD Radeon R9, ATI, Benchmarking, Computer science, Heterogeneous systems, Intel Xeon Phi, nVidia, OpenCL, Task scheduling, Tesla K20

June 17, 2017 by hgpu

Agentic Code Optimization via Compiler-LLM Cooperation

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

DVM: Real-Time Kernel Generation for Dynamic AI Models

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

* * *

high performance computing on graphics processing units: hgpu.org

Programming

Speeding up lattice sieve with Xeon Phi coprocessor

Synthesis of Embedded Software using Dataflow Schedule Graphs

Deep neural networks for direct, featureless learning through observation: the case of 2d spin models

DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications

Scalar collapse in AdS with an OpenCL open source code

ART vs. NDK vs. GPU acceleration: A study of performance of image processing algorithms on Android

High-Performance Out-of-core Block Randomized Singular Value Decomposition on GPU

On the Use of a GPU-Accelerated Mobile Device Processor for Sound Source Localization

Panda: A Compiler Framework for Concurrent CPU-GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers

Rgtsvm: Support Vector Machines on a GPU in R

Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras

Efficient OpenCL-based concurrent tasks offloading on accelerators

Recent source codes

Agentic Code Optimization via Compiler-LLM Cooperation

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

True 4-Bit Quantized CNN Training on CPU

cuFuzz: A GPU-oriented coverage-guided fuzzer for userland CUDA application

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

Most viewed papers (last 30 days)