13770
G. Latu, M. Haefele, J. Bigot, V. Grandgirard, T. Cartier-Michaud, F. Rozar
This work describes the challenges presented by porting parts ofthe Gysela code to the Intel Xeon Phi coprocessor, as well as techniques used for optimization, vectorization and tuning that can be applied to other applications. We evaluate the performance of somegeneric micro-benchmark on Phi versus Intel Sandy Bridge. Several interpolation kernels useful for the Gysela […]
View View   Download Download (PDF)   
Jan Lebert, Lutz Kunneke, Johannes Hagemann, Stephan C. Kramer
We discuss several strategies to implement Dykstra’s projection algorithm on NVIDIA’s compute unified device architecture (CUDA). Dykstra’s algorithm is the central step in and the computationally most expensive part of statistical multi-resolution methods. It projects a given vector onto the intersection of convex sets. Compared with a CPU implementation our CUDA implementation is one order […]
View View   Download Download (PDF)   
Sergey Zabelok, Robert Arslanbekov, Vladimir Kolobov
This paper describes recent progress towards porting a Unified Flow Solver (UFS) to heterogeneous parallel computing. UFS is an adaptive kinetic-fluid simulation tool, which combines Adaptive Mesh Refinement (AMR) with automatic cell-by-cell selection of kinetic or fluid solvers based on continuum breakdown criteria. The main challenge of porting UFS to graphics processing units (GPUs) comes […]
View View   Download Download (PDF)   
Paul Arts, Jacques Bloch, Peter Georg, Benjamin Glaessle, Simon Heybrock, Yu Komatsubara, Robert Lohmayer, Simon Mages, Bernhard Mendl, Nils Meyer, Alessio Parcianello, Dirk Pleiter, Florian Rappl, Mauro Rossi, Stefan Solbrig, Giampietro Tecchiolli, Tilo Wettig, Gianpaolo Zanier
We give an overview of QPACE 2, which is a custom-designed supercomputer based on Intel Xeon Phi processors, developed in a collaboration of Regensburg University and Eurotech. We give some general recommendations for how to write high-performance code for the Xeon Phi and then discuss our implementation of a domain-decomposition-based solver and present a number […]
View View   Download Download (PDF)   
Michal Januszewski, Andrzej Ptok, Dawid Crivelli, Bartlomiej Gardas
Obtaining a thermodynamically accurate phase diagram through numerical calculations is a computationally expensive problem that is crucially important to understanding the complex phenomena of solid state physics, such as superconductivity. In this work we show how this type of analysis can be significantly accelerated through the use of modern GPUs. We illustrate this with a […]
View View   Download Download (PDF)   
Hugh Potter, Dossay Oryspayev, Pieter Maris, Masha Sosonkina, James Vary, Sven Binder, Angelo Calci, Joachim Langhammer, Robert Roth, Umit Catalyurek, Erik Saule
This paper describes some applications of GPU acceleration in ab initio nuclear structure calculations. Specifically, we discuss GPU acceleration of the software package MFDn, a parallel nuclear structure eigensolver. We modify the matrix construction stage to run partly on the GPU. On the Titan supercomputer at the Oak Ridge Leadership Computing Facility, this produces a […]
View View   Download Download (PDF)   
Hannes Vogt, Mario Schrock
We adopt CUDA-capable Graphic Processing Units (GPUs) for Landau, Coulomb and maximally Abelian gauge fixing in 3+1 dimensional SU(3) and SU(2) lattice gauge field theories. A combination of simulated annealing and overrelaxation is used to aim for the global maximum of the gauge functional. We use a fine grained degree of parallelism to achieve the […]
Jens Glaser, Trung Dac Nguyen, Joshua A. Anderson, Pak Lui, Filippo Spiga, Jaime A. Millan, David C. Morse, Sharon C. Glotzer
We describe a highly optimized implementation of MPI domain decomposition in a GPU-enabled, general-purpose molecular dynamics code, HOOMD-blue (Anderson and Glotzer, arXiv:1308.5587). Our approach is inspired by a traditional CPU-based code, LAMMPS (Plimpton, J. Comp. Phys. 117, 1995), but is implemented within a code that was designed for execution on GPUs from the start (Anderson […]
View View   Download Download (PDF)   
Simon Heybrock, Balint Joo, Dhiraj D. Kalamkar, Mikhail Smelyanskiy, Karthikeyan Vaidyanathan, Tilo Wettig, Pradeep Dubey
The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context of Lattice Quantum Chromodynamics and implement such […]
View View   Download Download (PDF)   
Nuno Cardoso
In this work, we present the GPU implementation of the overrelaxation and steepest descent method with Fourier acceleration methods for Laudau and Coulomb gauge fixing using CUDA for SU(N) with N>2. A multi-GPU implementation of the overrelaxation method is also presented using MPI and CUDA. The GPU performance was measured on BlueWaters and compared against […]
View View   Download Download (PDF)   
David Abdurachmanov, Brian Bockelman, Peter Elmer, Giulio Eulisse, Robert Knight, Shahzad Muzaffar
Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. […]
View View   Download Download (PDF)   
Benjamin J. Block, Suam Kim, Peter Virnau, Kurt Binder
As a generic example for crystals where the crystal-fluid interface tension depends on the orientation of the interface relative to the crystal lattice axes, the nearest neighbor Ising model on the simple cubic lattice is studied over a wide temperature range, both above and below the roughening transition temperature. Using a thin film geometry $L_x […]
View View   Download Download (PDF)   
Page 1 of 1712345...10...Last »

* * *

* * *

Like us on Facebook

HGPU group

236 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1439 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: