high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A Performance Criteria for parallel Computation on basis of block size using CUDA Architecture

A Performance Criteria for parallel Computation on basis of block size using CUDA Architecture

Ashis Kumar Dash

International Journal of Scientific & Engineering Research, Volume 5, Issue 1, 2014

BibTeX

Download (PDF)

View

Source

2302

views

GPU based on CUDA Architecture developed by NVIDIA is a high performance computing device. Multiplication of matrices of large order can be computed in few seconds using GPU based on CUDA Architecture. A modern GPU consists of 16 highly threaded streaming multiprocessors (SMs). GPU named Fermi consists of 32 SMs. These are computing intensive devices. GPUs have been found to be the best platform for massive data parallelism. CUDA architecture is based on the heterogeneous platform comprising of both CPU and GPU that offers enormous potential to solve complex harder problems with high speed. In most applications the sequential part of a program is executed using CPU and numeric intensive part on GPU. But mere execution of numeric intensive part on GPU will not increase the performance of the computation. Since GPU consists of highly threaded multiprocessors, threads must be well organised into Grids and Grid into blocks to maximize performance of parallel computation, depending upon architecture of the GPU. In this paper an organization of threads of a particular GPU is discussed and block size is determined to maximize the performance of parallel computation through matrix multiplication.

Tags: Computer science, CUDA, Data parallelism, Heterogeneous systems, Matrix multiplication, nVidia, nVidia Quadro FX 3700

January 29, 2014 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

A Performance Criteria for parallel Computation on basis of block size using CUDA Architecture

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

A Performance Criteria for parallel Computation on basis of block size using CUDA Architecture

Share this:

Recent source codes

Most viewed papers (last 30 days)