12794
Alastair F. Donaldson
I present a tutorial overview demonstrating the key technique used by GPUVerify, a static verification tool for graphics processing unit (GPU) kernels. The technique is a method for translating a massively parallel GPU kernel into a sequential program such that correctness of the sequential program implies data race-freedom of the parallel kernel.
Gary K. Chen, Eric Chi, John Ranola, Kenneth Lange
The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical […]
Havard Heitlo Holm
As parallel and heterogeneous computing becomes more and more a necessity for implementing high performance simulators, it becomes increasingly harder for scientists and engineers without experience in high performance computing to achieve good performance. Even for those who knows how to write efficient code the process for doing so is time consuming and error prone, […]
F. D. Witherden, B. C. Vermeire, P. E. Vincent
PyFR is an open-source high-order accurate computational fluid dynamics solver for mixed unstructured grids that can target a range of hardware platforms from a single codebase. In this paper we demonstrate the ability of PyFR to perform high-order accurate unsteady simulations of flow on mixed unstructured grids using heterogeneous multi-node hardware. Specifically, after benchmarking single-node […]
Sergiy Gogolenko, Zhaojun Bai, Richard Scalettar
We present a block structured orthogonal factorization (BSOF) algorithm and its parallelization for computing the inversion of block p-cyclic matrices.We aim at the high performance on multicores with GPU accelerators. We provide a quantitative performance model for optimal host-device load balance, and validate the model through numerical tests. Benchmarking results show that the parallel BSOF […]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, Trevor Darrell
Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and […]
Gordon Inggs, David Thomas, Wayne Luk
We advocate a domain specific software development methodology for heterogeneous computing platforms such as Multicore CPUs, GPUs and FPGAs. We argue that three specific benefits are realised from adopting such an approach: portable, efficient implementations across heterogeneous platforms; domain specific metrics of quality that characterise platforms in a form software developers will understand; automatic, optimal […]
David H. Eberly
An In-Depth, Practical Guide to GPGPU Programming Using Direct3D 11. GPGPU Programming for Games and Science demonstrates how to achieve the following requirements to tackle practical problems in computer science and software engineering: Robustness, Accuracy, Speed, Quality source code that is easily maintained, reusable, and readable. The book primarily addresses programming on a graphics processing […]
Bo Fang
While graphics processing units (GPUs) have gained wide adoption as accelerators for general-purpose applications (GPGPU), the end-to-end reliability implications of their use have not been quantified. Fault injection is a widely used method for evaluating the reliability of applications. However, building a fault injector for GPGPU applications is challenging due to their massive parallelism, which […]
Peng Li, Guodong Li, Ganesh Gopalakrishnan
Even the careful GPU programmer can inadvertently introduce data races while writing and optimizing code. Currently available GPU race checking methods fall short either in terms of their formal guarantees, ease of use, or practicality. Existing symbolic methods: (1) do not fully support existing CUDA kernels; (2) may require user-specified assertions or invariants; (3) often […]
Wookhyun Han, Hwidong Bae, Hyosu Kim, Jiyoen Lee, Insik Shin
GPU (General-Purpose computation on Graphics Processing Units) offers an effective computing platform to accelerate a wide class of data-parallel computing. Multi-GPU’s appear as an attractive platform to speed up the computation of data-parallel GPU. This paper aims to explore the feasibility of relaxing the task-level restriction of single GPU use in multi-GPU real-time systems.We develop […]
Hyunsu Cho, Peter A. Yoon
Divide-and-conquer algorithm is a numerically stable and efficient algorithm that computes the eigenvalues and eigenvectors of a symmetric tridiagonal matrix. We often face the situation where the input matrix fits into the main memory but not into the on-chip memory of a GPU device. We present an out-of-core implementation where only part of the input […]
Page 1 of 7512345...102030...Last »

* * *

* * *

Like us on Facebook

HGPU group

149 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1231 peoples are following HGPU @twitter

Featured events

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: