high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Decreasing NAME III Solution Time Using GP-GPU

Decreasing NAME III Solution Time Using GP-GPU

Kingsley Gale-Sides

The University of Edinburgh

The University of Edinburgh, 2011

@article{gale2011decreasing,

title={Decreasing NAME III Solution Time Using GP-GPU},

author={Gale-Sides, K.},

year={2011}

}

Download (PDF)

View

Source

2540

views

The potential for decreasing the solution time for the UK Met Office NAME III [1] lagrangian particle atmospheric particle dispersion modelling code was examined. The code was ported to the EPCC Ness and Fermi0 machines and compiled with the PGI compiler. Timing benchmarks and profiling completed for a particle only run, and a cloud gamma run to examine potential areas for speed up. A prototypical simple dispersion model was conceptually compared to the NAME III Particle benchmark. This simple model was accelerated using OpenMP, CUDAC and CUDA FORTRAN. Timing benchmarks and profiling completed for various problem sizes from 1000, to 10,000,000 particles, 10,000,000 representing a realistic problem size for an emergency particle run [2]. The simple model was found to have a total speed up of up to ~50x for the largest problem size with the particle loop being sped up ~80-100x. The results can be extrapolated to indicate the NAME III code could be sped up by approximately 12x for this specific benchmark, or about 59x scaled to a realistic problem size. Other areas with potential for speed up in NAME III were also evaluated. Due to the complexity of the NAME III code although CUDA and GP-GPU acceleration can readily be applied to targeted areas of code representing specific benchmarks, it may not represent the best option for speeding up the whole of the code. The Met Office may like to consider a hybrid OpenMP and MPI approach utilising the existing OpenMP implementation.

Tags: Benchmarking, Cloud, Computer science, CUDA, Fortran, MPI, nVidia, Tesla C2050, Tesla M1060, Thesis

January 1, 2012 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

* * *

high performance computing on graphics processing units: hgpu.org

Decreasing NAME III Solution Time Using GP-GPU

Your response

Recent source codes

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Vortex-Optimized Light-weight Toolchain (VOLT)

SciDef: Automated Definition Extraction from Scientific Literature

bioagent-bench: Benchmark for evaluating LLM agents in bioinformatics

Benchmark suite for LLM inference on NVIDIA consumer GPUs

Theorizer: from the paper Generating Literature-Driven Scientific Discoveries at Scale

Nsight Python: a Python kernel profiling interface based on NVIDIA Nsight Tools

Awesome LLM-Driven Kernel Generation

Most viewed papers (last 30 days)

Decreasing NAME III Solution Time Using GP-GPU

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)