high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Computer vision » Home-made Diffusion Model from Scratch to Hatch

Home-made Diffusion Model from Scratch to Hatch

Shih-Ying Yeh

National Tsing Hua University

arXiv:2509.06068 [cs.CV], (7 Sep 2025)

DOI:10.48550/arXiv.2509.06068

@misc{yeh2025homemadediffusionmodelscratch,

title={Home-made Diffusion Model from Scratch to Hatch},

author={Shih-Ying Yeh},

year={2025},

eprint={2509.06068},

archivePrefix={arXiv},

primaryClass={cs.CV},

url={https://arxiv.org/abs/2509.06068}

}

Download (PDF)

View

Source

Source codes

Package:

HDM: Home made Diffusion Models

6542

views

We introduce Home-made Diffusion Model (HDM), an efficient yet powerful text-to-image diffusion model optimized for training (and inferring) on consumer-grade hardware. HDM achieves competitive 1024×1024 generation quality while maintaining a remarkably low training cost of $535-620 using four RTX5090 GPUs, representing a significant reduction in computational requirements compared to traditional approaches. Our key contributions include: (1) Cross-U-Transformer (XUT), a novel U-shape transformer, Cross-U-Transformer (XUT), that employs cross-attention for skip connections, providing superior feature integration that leads to remarkable compositional consistency; (2) a comprehensive training recipe that incorporates TREAD acceleration, a novel shifted square crop strategy for efficient arbitrary aspect-ratio training, and progressive resolution scaling; and (3) an empirical demonstration that smaller models (343M parameters) with carefully crafted architectures can achieve high-quality results and emergent capabilities, such as intuitive camera control. Our work provides an alternative paradigm of scaling, demonstrating a viable path toward democratizing high-quality text-to-image generation for individual researchers and smaller organizations with limited computational resources.

Tags: Computer science, Computer vision, nVidia, nVidia GeForce RTX 5090, Package, Python, PyTorch

September 14, 2025 by hgpu

Rating: 5.0/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Home-made Diffusion Model from Scratch to Hatch

Package:

Your response

Recent source codes

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Kernel Library for LLM Serving

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Genten: Software for Generalized Tensor Decompositions by Sandia National Laboratories

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

Most viewed papers (last 30 days)

Home-made Diffusion Model from Scratch to Hatch

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)