9311

ACL2 Meets the GPU: Formalizing a CUDA-based Parallelizable All-Pairs Shortest Path Algorithm in ACL2

David S. Hardin, Samuel S. Hardin
Advanced Technology Center, Rockwell Collins, Cedar Rapids, IA, USA
arXiv:1304.7863 [cs.LO], (30 Apr 2013)
@article{2013arXiv1304.7863H,

   author={Hardin, David S. and Hardin, Samuel S.},

   title={"ACL2MeetstheGPU:FormalizingaCUDA-basedParallelizableAll-PairsShortestPathAlgorithminACL2"},

   journal={ArXiv e-prints},

   archivePrefix={"arXiv"},

   eprint={1304.7863},

   primaryClass={"cs.LO"},

   keywords={Computer Science – Logic in Computer Science, Computer Science – Data Structures and Algorithms},

   year={2013},

   month={apr}

}

Download Download (PDF)   View View   Source Source   

283

views

As Graphics Processing Units (GPUs) have gained in capability and GPU development environments have matured, developers are increasingly turning to the GPU to off-load the main host CPU of numerically-intensive, parallelizable computations. Modern GPUs feature hundreds of cores, and offer programming niceties such as double-precision floating point, and even limited recursion. This shift from CPU to GPU, however, raises the question: how do we know that these new GPU-based algorithms are correct? In order to explore this new verification frontier, we formalized a parallelizable all-pairs shortest path (APSP) algorithm for weighted graphs, originally coded in NVIDIA’s CUDA language, in ACL2. The ACL2 specification is written using a single-threaded object (stobj) and tail recursion, as the stobj/tail recursion combination yields the most straightforward translation from imperative programming languages, as well as efficient, scalable executable specifications within ACL2 itself. The ACL2 version of the APSP algorithm can process millions of vertices and edges with little to no garbage generation, and executes at one-sixth the speed of a host-based version of APSP coded in C- a very respectable result for a theorem prover. In addition to formalizing the APSP algorithm (which uses Dijkstra’s shortest path algorithm at its core), we have also provided capability that the original APSP code lacked, namely shortest path recovery. Path recovery is accomplished using a secondary ACL2 stobj implementing a LIFO stack, which is proven correct. To conclude the experiment, we ported the ACL2 version of the APSP kernels back to C, resulting in a less than 5% slowdown, and also performed a partial back-port to CUDA, which, surprisingly, yielded a slight performance increase.
VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)

* * *

* * *

Like us on Facebook

HGPU group

128 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1189 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: