Q. Lu, J. Amundson
Synergia is a parallel, 3-dimensional space-charge particle-in-cell accelerator modeling code. We present our work porting the purely MPI-based version of the code to a hybrid of CPU and GPU computing kernels. The hybrid code uses the CUDA platform in the same framework as the pure MPI solution. We have implemented a lock-free collaborative charge-deposition algorithm […]
Nachiket Sahasrabudhe, Mahesh Mynam, Ajay Nandgaonkar, Gayathri Jayaraman
We propose a novel method to simulate non-isothermal flows. This method is ideally suited for the GPU architecture. The new algorithm is derived by coupling the lattice Boltzmann formulation for the flow with the finite difference scheme for the temperature field. We apply this algorithm to solve for the flow in the well known buoyancy […]
View View   Download Download (PDF)   
Abhinandan Majumdar, Srihari Cadambi, Srimat T. Chakradhar
Embedded learning applications in automobiles, surveillance, robotics, and defense are computationally intensive, and process large amounts of real-time data. Systems for such workloads have to balance stringent performance constraints within limited power budgets. High performance computer processing units (CPUs) and graphics processing units (GPUs) cannot be used in an embedded platform due to power issues. […]
View View   Download Download (PDF)   
Wenjing Ma, Sriram Krishnamoorthy, Oreste Villa, Karol Kowalski
Tensor contractions are generalized multidimensional matrix multiplication operations that widely occur in quantum chemistry. Efficient execution of tensor contractions on GPUs requires tackling several challenges to be addressed, including index permutation and small dimension-sizes reducing thread block utilization. In this paper, we present our approach to automatically generate CUDA code to execute tensor contractions on […]
View View   Download Download (PDF)   
R. Spurzem, P. Berczik, K. Nitadori, G. Marcus, A. Kugel, R. Manner, I. Berentzen, R. Klessen, R. Banerjee
We present our new parallel GPU clusters in Beijing and Heidelberg and demonstrate the nearly optimal speedup and performance for parallel direct astrophysical N-body simulations with up to six million bodies. We reach about 1/3 of the peak performance for a real application code. The clusters are used to simulate dense star clusters with many […]
View View   Download Download (PDF)   
Dimitri Komatitsch, Dominik Goddeke, Gordon Erlebacher, David Michea
We implement a high-order finite-element application, which performs the numerical simulation of seismic wave propagation resulting for instance from earthquakes at the scale of a continent or from active seismic acquisition experiments in the oil industry, on a large GPU-enhanced cluster. Mesh coloring enables an efficient accumulation of degrees of freedom in the assembly process […]
View View   Download Download (PDF)   
Chunhua Men,Xuejun Gu,Dongju Choi,Amitava Majumdar,Ziyi Zheng,Klaus Mueller,Steve B. Jiang
The widespread adoption of on-board volumetric imaging in cancer radiotherapy has stimulated research efforts to develop online adaptive radiotherapy techniques to handle the inter-fraction variation of the patient’s geometry. Such efforts face major technical challenges to perform treatment planning in real time. To overcome this challenge, we are developing a supercomputing online re-planning environment (SCORE) […]
View View   Download Download (PDF)   

* * *

* * *

Follow us on Twitter

HGPU group

1666 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

338 people like HGPU on Facebook

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: