14047
I.A. Surmin, S.I. Bastrakov, E.S. Efimenko, A.A. Gonoskov, A.V. Korzhimanov, I.B. Meyerov
This paper concerns development of a high-performance implementation of the Particle-in-Cell method for plasma simulation on Intel Xeon Phi coprocessors. We discuss suitability of the method for Xeon Phi architecture and present our experience of porting and optimization of the existing parallel Particle-in-Cell code PICADOR. Direct porting with no code modification gives performance on Xeon […]
View View   Download Download (PDF)   
Giuseppe Cerati, Peter Elmer, Steven Lantz, Kevin McDermott, Dan Riley, Matevz Tadel, Peter Wittich, Frank Wurthwein, Avi Yagil
Power density constraints are limiting the performance improvements of modern CPUs. To address this we have seen the introduction of lower-power, multi-core processors, but the future will be even more exciting. In order to stay within the power density limits but still obtain Moore’s Law performance/price gains, it will be necessary to parallelize algorithms to […]
View View   Download Download (PDF)   
C. L. Jermain, G. E. Rowlands, R. A. Buhrman, D. C. Ralph
Highly-parallel graphics processing units (GPUs) can improve the speed of micromagnetic simulations significantly as compared to conventional computing using central processing units (CPUs). We present a strategy for performing GPU-accelerated micromagnetic simulations by utilizing cost-effective GPU access offered by cloud computing services with an open-source Python-based program for running the MuMax3 micromagnetics code remotely. We […]
Mario Hernandez, Jose M. Garcia, Jose M. Cecilia
Iterative stencil computations are important pattern of computations in different computational fields such as physics or chemistry simulations. A stencil computation repeatedly updates each point of a d-dimensional grid as a function of itself and its near neighbors. As the demand for more and more compute power is growing rapidly in different fields of research, […]
View View   Download Download (PDF)   
G. Latu, M. Haefele, J. Bigot, V. Grandgirard, T. Cartier-Michaud, F. Rozar
This work describes the challenges presented by porting parts ofthe Gysela code to the Intel Xeon Phi coprocessor, as well as techniques used for optimization, vectorization and tuning that can be applied to other applications. We evaluate the performance of somegeneric micro-benchmark on Phi versus Intel Sandy Bridge. Several interpolation kernels useful for the Gysela […]
View View   Download Download (PDF)   
Jan Lebert, Lutz Kunneke, Johannes Hagemann, Stephan C. Kramer
We discuss several strategies to implement Dykstra’s projection algorithm on NVIDIA’s compute unified device architecture (CUDA). Dykstra’s algorithm is the central step in and the computationally most expensive part of statistical multi-resolution methods. It projects a given vector onto the intersection of convex sets. Compared with a CPU implementation our CUDA implementation is one order […]
View View   Download Download (PDF)   
Zhen Tian, Fei Peng, Michael Folkerts, Jun Tan, Xun Jia, Steve B. Jiang
VMAT optimization is a computationally challenging problem due to its large data size, high degrees of freedom, and many hardware constraints. High-performance graphics processing units have been used to speed up the computations. However, its small memory size cannot handle cases with a large dose-deposition coefficient (DDC) matrix. This paper is to report an implementation […]
View View   Download Download (PDF)   
Zhen Tian, Feng Shi, Michael Folkerts, Nan Qin, Steve B. Jiang, Xun Jia
Monte Carlo (MC) method has been recognized the most accurate dose calculation method for radiotherapy. However, its extremely long computation time impedes clinical applications. Recently, a lot of efforts have been made to realize fast MC dose calculation on GPUs. Nonetheless, most of the GPU-based MC dose engines were developed in NVidia CUDA environment. This […]
View View   Download Download (PDF)   
Matthias Bach
Quarks and gluons are the building blocks of all hadronic matter, like protons and neutrons. Their interaction is described by Quantum Chromodynamics (QCD), a theory under test by large scale experiments like the Large Hadron Collider (LHC) at CERN and in the future at the Facility for Antiproton and Ion Research (FAIR) at GSI. However, […]
View View   Download Download (PDF)   
Theo M. Nieuwenhuizen, Matthew T.P. Liska
Stochastic electrodynamics is a classical theory which assumes that the physical vacuum consists of classical stochastic fields with average energy $frac{1}{2}hbar omega$ in each mode, i.e., the zero-point Planck spectrum. While this classical theory explains many quantum phenomena related to harmonic oscillator problems, hard results on nonlinear systems are still lacking. In this work the […]
View View   Download Download (PDF)   
P. Bialas, J. Kowal, A. Strzelecki, T. Bednarski, E. Czerwinski, A. Gajos, D. Kaminska, L. Kaplon, A. Kochanowski, G. Korcyl, P. Kowalski, T. Kozik, W. Krzemien, E. Kubicz, P. Moskal, Sz. Niedzwiecki, M. Palka, L. Raczynski, Z. Rudy, O. Rundel, P. Salabura, N.G. Sharma, M. Silarski, A. Slomski, J. Smyrski, A. Wieczorek, W. Wislicki, M. Zielinski, N. Zon
We present a fast GPU implementation of the image reconstruction routine, for a novel two strip PET detector that relies solely on the time of flight measurements.
View View   Download Download (PDF)   
Soichiro Ikuno, Susumu Nakata, Yuta Hirokawa, Taku Itoh
High performance computing of Meshless Time Domain Method (MTDM) on multi-GPU using the supercomputer HA-PACS (Highly Accelerated Parallel Advanced system for Computational Sciences) at University of Tsukuba is investigated. Generally, the finite difference time domain (FDTD) method is adopted for the numerical simulation of the electromagnetic wave propagation phenomena. However, the numerical domain must be […]
View View   Download Download (PDF)   
Page 1 of 5612345...102030...Last »

* * *

* * *

Like us on Facebook

HGPU group

244 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1474 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: