Hartwig Anzt, Stanimire Tomov, Jack Dongarra
Numerical methods in sparse linear algebra typically rely on a fast and efficient matrix vector product, as this usually is the backbone of iterative algorithms for solving eigenvalue problems or linear systems. Against the background of a large diversity in the characteristics of high performance computer architectures, it is a challenge to derive a cross-platform […]
View View   Download Download (PDF)   
Hartwig Anzt, Stanimire Tomov, Piotr Luszczek, Ichitaro Yamazaki, Jack Dongarra, William Sawyer
Krylov subspace solvers are often the method of choice when solving sparse linear systems iteratively. At the same time, hardware accelerators such as graphics processing units (GPUs) continue to offer significant floating point performance gains for matrix and vector computations through easy-to-use libraries of computational kernels. However, as these libraries are usually composed of a […]
View View   Download Download (PDF)   
Hanyuan Zheng, Anping Song, Zhixiang Liu, Lei Xu, Wu Zhang
According to GaBP (Gaussian Belief Propagation) algorithm, this article presents a GaBP-GPU algorithm of solving large-scale symmetric diagonally dominant sparse linear systems based on GPU. Combined with GaBP-GPU algorithm, a storage format (MCSC) is presented. We extract some diagonally dominant matrices from the University of Florida Sparse Matrix Collection as test examples. The experimental results […]
View View   Download Download (PDF)   
Ioannis E. Venetis, Georgios Goumas, Markus Geveler, Dirk Ribbrock
In this paper we report our experiences in porting the FEASTFLOW software infrastructure to the Intel Xeon Phi coprocessor. Our efforts involved both the evaluation of programming models including OpenCL, POSIX threads and OpenMP and typical optimization strategies like parallelization and vectorization. Since the straightforward porting process of the already existing OpenCL version of the […]
View View   Download Download (PDF)   
Ping Guo, Liqiang Wang
This paper presents an integrated analytical and profile-based cross-architecture performance modeling tool to specifically provide inter-architecture performance prediction for Sparse Matrix-Vector Multiplication (SpMV) on NVIDIA GPU architectures. To design and construct the tool, we investigate the inter-architecture relative performance for multiple SpMV kernels. For a sparse matrix, based on its SpMV kernel performance measured on […]
View View   Download Download (PDF)   
Laurie Elizabeth Miller
There is an increasing need for computational power to drive software tools used in power systems planning and operations, since the emergence of modern energy markets and recent renewable generation technology fundamentally alters how energy flows through the existing power grid. While special-purpose hardware, including supercomputers, has been explored for this purpose, inexpensive commodity hardware […]
View View   Download Download (PDF)   
Sardar Anisul Haque
The objective of high performance computing (HPC) is to ensure that the computational power of hardware resources is well utilized to solve a problem. Various techniques are usually employed to achieve this goal. Improvement of algorithm to reduce the number of arithmetic operations, modifications in accessing data or rearrangement of data in order to reduce […]
View View   Download Download (PDF)   
Ilya B. Labutin, Irina V. Surodina
We propose a method for preconditioner construction and parallel implementations of the Preconditioned Conjugate Gradient algorithm on GPU platforms. The preconditioning matrix is an approximate inverse derived from an algorithm for the iterative improvement of a solution to linear equations. Using a sparse matrix-vector product, our preconditioner is well suited for massively parallel GPU architecture. […]
View View   Download Download (PDF)   
Noboru Tanabe, Sonoko Tomimori, Masami Takata, Kazuki Joe
We propose an adaptability judging method applied to sparse matrices and the target cache memory using two metrics based on spatial locality and temporal locality. For indirect access sequences of sparse matrix-vector multiplications, one metric is the number of valid data within a cache line, and another metric is average reference interval. We also develop […]
View View   Download Download (PDF)   
Sivaramakrishna Bharadwaj Indarapu, Manoj Maramreddy, Kishore Kothapalli
Multiplying a sparse matrix with a vector, denoted spmv, is a fundamental operation in linear algebra with several applications. Hence, efficient and scalable implementation of spmv has been a topic of immense research. Recent efforts are aimed at implementations on GPUs, multicore architectures, and such emerging computational platforms. Owing to the highly irregular nature of […]
View View   Download Download (PDF)   
Kaupo Kuresson
The purpose of this thesis was to benchmark and compare different representations of sparse matrices and algorithms for multiplying them with a vector. Also, to see the performance differences of running the algorithms on a CPU and GPU(s). Four different storage formats were tested – full matrix storage, coordinate storage (COO), ELLPACK (ELL), compressed sparse […]
Jianfei Zhang, Lei Zhang
Graphics Processing Unit (GPU) has obtained great success in scientific computations for its tremendous computational horsepower and very high memory bandwidth. This paper discusses the efficient way to implement polynomial preconditioned conjugate gradient solver for the finite element computation of elasticity on NVIDIA GPUs using Compute Unified Device Architecture (CUDA). Sliced Block ELLPACK (SBELL) format […]
View View   Download Download (PDF)   
Page 1 of 1112345...10...Last »

* * *

* * *

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 11.4
  • SDK: AMD APP SDK 2.8
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 5.0.35, AMD APP SDK 2.8

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: