Thomas Chun Pong Chau
This thesis addresses the problem of designing real-time reconfigurable systems. Our first contribution of this thesis is to propose novel data structures and memory architectures for accelerating real-time proximity queries, with potential application to robotic surgery. We optimise performance while maintaining accuracy by several techniques including mixed precision, function transformation and streaming data flow. Significant […]
Marc Baboulin, Joel Falcou, Ian Masliah
The increasing complexity of new parallel architectures has widened the gap between adaptability and efficiency of the codes. As high performance numerical libraries tend to focus more on performance, we wish to address this issue using a C++ library called NT2. By analyzing the properties of the linear algebra domain that can be extracted from […]
View View   Download Download (PDF)   
Ichitaro Yamazaki, Stanimire Tomov, Tingxing Dong, Jack Dongarra
We propose a mixed-precision orthogonalization scheme that takes the input matrix in a standard 32 or 64-bit floating-point precision, but uses higher-precision arithmetics to accumulate its intermediate results. For the 64-bit precision, our scheme uses software emulation for the higher-precision arithmetics, and requires about 20x more computation but about the same amount of communication as […]
View View   Download Download (PDF)   
Jacques du Toit, Johannes Lotz, Uwe Naumann
We consider a GPU accelerated program using Monte Carlo simulation to price a basket call option on 10 FX rates driven by a 10 factor local volatility model. We develop an adjoint version of this program using algorithmic differentiation. The code uses mixed precision. For our test problem of 10,000 sample paths with 360 Euler […]
View View   Download Download (PDF)   
Jianfei Zhang, Lei Zhang
Graphics Processing Unit (GPU) has obtained great success in scientific computations for its tremendous computational horsepower and very high memory bandwidth. This paper discusses the efficient way to implement polynomial preconditioned conjugate gradient solver for the finite element computation of elasticity on NVIDIA GPUs using Compute Unified Device Architecture (CUDA). Sliced Block ELLPACK (SBELL) format […]
View View   Download Download (PDF)   
Jacques du Toit, Isabel Ehrlich
We present high performance implementations on a CPU and an NVIDIA GPU of a Monte Carlo pricer for a simple FX basket option driven by a multi-factor local volatility model. Basket options such as these are typically considered too complicated to tackle analytically in a market-consistent manner, and are too high dimensional for PDE methods. […]
View View   Download Download (PDF)   
Mario Schrock, Hannes Vogt
A lattice gauge theory framework for simulations on graphic processing units (GPUs) using NVIDIA’s CUDA is presented. The code comprises template classes that take care of an optimal data pattern to ensure coalesced reading from device memory to achieve maximum performance. In this work we concentrate on applications for lattice gauge fixing in 3+1 dimensional […]
View View   Download Download (PDF)   
Hartwig Anzt, Maribel Castillo, Juan C. Fernandez, Vincent Heuveline, Francisco D. Igual, Rafael Mayo and Enrique S. Quintana-Orti
In this paper, we analyze the power consumption of different GPU-accelerated iterative solver implementations enhanced with energy-saving techniques. Specifically, while conducting kernel calls on the graphics accelerator, we manually set the host system to a power-efficient idle-wait status so as to leverage dynamic voltage and frequency control. While the usage of iterative refinement combined with […]
View View   Download Download (PDF)   
Hartwig Anzt, Piotr Luszczek, Jack Dongarra, Vincent Heuveline
In hardware-aware high performance computing, block- asynchronous iteration and mixed precision iterative refinement are two techniques that are applied to leverage the computing power of SIMD accelerators like GPUs. Although they use a very different approach for this purpose, they share the basic idea of compensating the convergence behaviour of an inferior numerical algorithm by […]
View View   Download Download (PDF)   
Hartwig Anzt, Vincent Heuveline, Bjorn Rocker, Maribel Castillo, Juan C. Fernandez, Rafael Mayo, Enrique S. Quintana-Orti
This paper presents a detailed analysis of a mixed precision iterative refinement solver applied to a linear system obtained from the 2D discretization of a fluid flow problem. The total execution time and energy need of different soft- and hardware implementations are measured and compared with those of a plain GMRES-based solver in double precision. […]
View View   Download Download (PDF)   
Hartwig Anzt, Vincent Heuveline, Bjorn Rocker
This paper proposes an error correction method for solving linear systems of equations and the evaluation of an implementation using mixed precision techniques. While different technologies are available, graphic processing units (GPUs) have been established as particularly powerful coprocessors in recent years. For this reason, our error correction approach is focused on a CUDA implementation […]
View View   Download Download (PDF)   
Jack J. Dongarra
Summary form only given. In this talk we examine how high performance computing has changed over the last 10-years and look toward the future in terms of trends. These changes have had and will continue to have a major impact on our software. Some of the software and algorithm challenges have already been encountered, such […]
View View   Download Download (PDF)   
Page 1 of 3123

* * *

* * *

Follow us on Twitter

HGPU group

1658 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

335 people like HGPU on Facebook

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: