Simon L. Grimm, Joachim G. Stadel
We describe a GPU implementation of a hybrid symplectic N-body integrator, GENGA (Gravitational ENcounters with Gpu Acceleration), designed to integrate planet and planetesimal dynamics in the late stage of planet formation and stability analysis of planetary systems. GENGA is based on the integration scheme of the Mercury code (Chambers 1999), which handles close encounters with […]
P. Borovska, D. Ivanova
The whitepaper reports our investigation into the porting, optimization and subsequent performance of the astrophysics software package GADGET, on the Intel Xeon Phi. The GADGET code is intended for cosmological N-body/SPH simulations to solve a wide range of astrophysical tasks. The test cases within the project were simulations of galaxy systems. A performance analysis of […]
View View   Download Download (PDF)   
Benoit Lange, Pierre Fortin
In astrophysical N-body simulations, Dehnen’s algorithm, implemented in the serial falcON code and based on a dual tree traversal, is faster than serial Barnes-Hut tree-codes, but outperformed by parallel CPU and GPU tree-codes. In this paper, we present a parallel dual tree traversal, implemented in the pfalcON code, targeting multi-core CPUs and manycore architectures (Xeon […]
View View   Download Download (PDF)   
Konstantinos Krommydas, Thomas R.W. Scogland, Wu-chun Feng
General-purpose computing on an ever-broadening array of parallel devices has led to an increasingly complex and multi-dimensional landscape with respect to programmability and performance optimization. The growing diversity of parallel architectures presents many challenges to the domain scientist, including device selection, programming model, and level of investment in optimization. All of these choices influence the […]
View View   Download Download (PDF)   
P. Berczik, R. Spurzem, L. Wang, S. Zhong, O. Veles, I. Zinchenko, S. Huang, M. Tsai, G. Kennedy, S. Li, L. Naso, C. Li
We present direct astrophysical N-body simulations with up to a few million bodies using our parallel MPI/CUDA code on large GPU clusters in China, Ukraine and Germany, with different kinds of GPU hardware. These clusters are directly linked under the Chinese Academy of Sciences special GPU cluster program in the cooperation of ICCS (International Center […]
Mudassar Majeed, Usman Dastgeer, Christoph Kessler
SkePU is a C++ template library with a simple and unified interface for expressing data parallel computations in terms of generic components, called skeletons, on multi-GPU systems using CUDA and OpenCL. The smart containers in SkePU, such as Matrix and Vector, perform data management with a lazy memory copying mechanism that reduces redundant data communication. […]
View View   Download Download (PDF)   
Ivan Zecena, Martin Burtscher, Tongdan Jin, Ziliang Zong
N-body simulations are computation-intensive ap-plications that calculate the motion of a large number of bodies under pair-wise forces. Although different versions of n-body codes have been widely used in many scientific fields, the perfor-mance and energy efficiency of various n-body codes have not been comprehensively studied, especially when they are running on newly released multi-core […]
View View   Download Download (PDF)   
Sergio M. Martin, Fernando G. Tinetti, Nicanor B. Casas, Graciela E. De Luca, Daniel A. Giulianelli
N-Body simulation algorithms are amongst the most commonly used within the field of scientific computing. Especially in computational astrophysics, they are used to simulate gravitational scenarios for solar systems or galactic collisions. Parallel versions of such N-Body algorithms have been extensively designed and optimized for multicore and distributed computing schemes. However, N-Body algorithms are still […]
View View   Download Download (PDF)   
Alexander Moore
This thesis begins with a description of a hybrid symplectic integrator named QYMSYM which is capable of planetary system simulations. This integrator has been programmed with the Compute Unified Device Architecture (CUDA) language which allows for implementation on Graphics Processing Units (GPUs). With the enhanced compute performance made available by this choice, QYMSYM was used […]
Michael S. Warren
We report on improvements made over the past two decades to our adaptive treecode N-body method (HOT). A mathematical and computational approach to the cosmological N-body problem is described, with performance and scalability measured up to 256k (2^18) processors. We present error analysis and scientific application results from a series of more than ten 69 […]
View View   Download Download (PDF)   
Q. Hu
The N-body problem appears in many computational physics simulations. At each time step the computation involves an all-pairs sum whose complexity is quadratic, followed by an update of particle positions. This cost means that it is not practical to solve such dynamic N-body problems on large scale. To improve this situation, we use both algorithmic […]
View View   Download Download (PDF)   
Go Ogiya, Masao Mori, Yohei Miki, Taisuke Boku, Naohito Nakasato
The discrepancy in the mass-density profile of dark matter halos between simulations and observations, the core-cusp problem, is a long-standing open question in the standard paradigm of cold dark matter cosmology. Here, we study the dynamical response of dark matter halos to oscillations of the galactic potential which are induced by a cycle of gas […]
View View   Download Download (PDF)   
Page 1 of 812345...Last »

* * *

* * *

* * *

Free GPU computing nodes at

Registered users can now run their OpenCL application at We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 11.4
  • SDK: AMD APP SDK 2.8
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 5.0.35, AMD APP SDK 2.8

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to will be treated according to our Privacy Policy

HGPU group © 2010-2014

All rights belong to the respective authors

Contact us: