high performance computing on graphics processing units: hgpu.org

hgpu.org » Tela K40

An In-depth Performance Characterization of CPU- and GPU-based DNN Training on Modern Architectures

Ammar Ahmad Awan, Hari Subramoni, Dhabaleswar K. Panda

View

Tags: Benchmarking, Caffe, Computer science, CUBLAS, CUDA, Deep learning, Intel Xeon Phi, Machine learning, nVidia, Tela K40, Tesla K80, Tesla P100

December 24, 2017 by hgpu

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

Accelerating cosmological simulations on GPUs: a portable approach using OpenMP

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

ConCuR: Conciseness Makes State-of-the-Art Kernel Generation

VibeCodeHPC - Multi Agentic Vibe Coding for HPC

VibeCodeHPC: An Agent-Based Iterative Prompting Auto-Tuner for HPC Code Generation Using LLMs

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

exa-AMD: Exascale Accelerated Materials Discovery

exa-AMD: An Exascale-Ready Framework for Accelerating the Discovery and Design of Functional Materials

TRUST: a thermalhydraulic software package for CFD simulations

TRUST: the HPC open-source CFD platform – from CPU to GPU

Modular: The Modular Platform (includes MAX & Mojo)

Mojo: MLIR-Based Performance-Portable HPC Science Kernels on GPUs for the Python Ecosystem

Allo: Accelerator Design Language

Dato: A Task-Based Programming Model for Dataflow Accelerators

Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization

Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization

See all packages

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Login | Sitemap | Feedback | Policy

Contact us: