high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Multi-level Parallelism for Time- and Cost-efficient Parallel Discrete Event Simulation on GPUs

Multi-level Parallelism for Time- and Cost-efficient Parallel Discrete Event Simulation on GPUs

Georg Kunz, Daniel Schemmel, James Gross, Klaus Wehrle

Communication and Distributed Systems, RWTH Aachen University

26th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS’12), 2012

@article{kunz2012multilevel,

title={Multi-level Parallelism for Time- and Cost-efficient Parallel Discrete Event Simulation on GPUs},

author={Kunz, Georg and Schemmel, Daniel and Gross, James and Wehrle, Klaus},

year={2012}

}

Download (PDF)

View

Source

2282

views

Developing complex technical systems requires a systematic exploration of the given design space in order to identify optimal system configurations. However, studying the effects and interactions of even a small number of system parameters often requires an extensive number of simulation runs. This in turn results in excessive runtime demands which severely hamper thorough design space explorations. In this paper, we present a parallel discrete event simulation scheme that enables cost- and time-efficient execution of large scale parameter studies on GPUs. In order to efficiently accommodate the stream-processing paradigm of GPUs, our parallelization scheme exploits two orthogonal levels of parallelism: External parallelism among the inherently independent simulations of a parameter study and internal parallelism among independent events within each individual simulation of a parameter study. Specifically, we design an event aggregation strategy based on external parallelism that generates workloads suitable for GPUs. In addition, we define a pipelined event execution mechanism based on internal parallelism to hide the transfer latencies between host- and GPU-memory. We analyze the performance characteristics of our parallelization scheme by means of a prototype implementation and show a 25-fold performance improvement over purely CPU-based execution.

Tags: Computer science, CUDA, Design space exploration, nVidia, nVidia GeForce GTX 470

April 19, 2012 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

Multi-level Parallelism for Time- and Cost-efficient Parallel Discrete Event Simulation on GPUs

Your response

Recent source codes

Agentic Code Optimization via Compiler-LLM Cooperation

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

True 4-Bit Quantized CNN Training on CPU

cuFuzz: A GPU-oriented coverage-guided fuzzer for userland CUDA application

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

Most viewed papers (last 30 days)

Multi-level Parallelism for Time- and Cost-efficient Parallel Discrete Event Simulation on GPUs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)