high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » NQueens on CUDA: Optimization Issues

NQueens on CUDA: Optimization Issues

Frank Feinbube, Bernhard Rabe, Martin von Lowis, Andreas Polze

Ninth International Symposium on Parallel and Distributed Computing (ISPDC), 2010

DOI:10.1109/ISPDC.2010.22

BibTeX

Source

1850

views

Todays commercial off-the-shelf computer systems are multicore computing systems as a combination of CPU, graphic processor (GPU) and custom devices. In comparison with CPU cores, graphic cards are capable to execute hundreds up to thousands compute units in parallel. To benefit from these GPU computing resources, applications have to be parallelized and adapted to the target architecture. In this paper we show our experience in applying the NQueens puzzle solution on GPUs using Nvidia’s CUDA (Compute Unified Device Architecture) technology. Using the example of memory usage and memory access, we demonstrate that optimizations of CUDA programs may have contrary results on different CUDA architectures. Evaluation results will point out, that it is not sufficient to use new programming languages or compilers to achieve best results with emerging graphic card computing.

Tags: Algorithms, Computer science, CUDA, Memory model, nVidia, Optimization

April 13, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

NQueens on CUDA: Optimization Issues

Your response

Recent source codes

GEAK-agent: LLM-based AI agent, which can write correct and efficient GPU kernels automatically

OpenDwarfs 2025: re-engineered version of the OpenDwarfs benchmark suite, for compatibility with modern platforms

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

Most viewed papers (last 30 days)

NQueens on CUDA: Optimization Issues

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)