14504
Dominique Aubert, Nicolas Deparis, Pierre Ocvirk
EMMA is a cosmological simulation code aimed at investigating the reionization epoch. It handles simultaneously collisionless and gas dynamics, as well as radiative transfer physics using a moment-based description with the M1 approximation. Field quantities are stored and computed on an adaptive 3D mesh and the spatial resolution can be dynamically modified based on physically-motivated […]
View View   Download Download (PDF)   
Azzam Haidar, Chongxiao Cao, Stanimire Tomov, Asim YarKhan, Piotr Luszczek, Jack Dongarra
Modern high performance computing environments are composed of networks of compute nodes that often contain a variety of heterogeneous compute resources, such as multicore-CPUs, GPUs, and coprocessors. One challenge faced by domain scientists is how to efficiently use all these distributed, heterogeneous resources. In order to use the GPUs effectively, the workload parallelism needs to […]
View View   Download Download (PDF)   
Xavier Saez, Alejandro Soba, Edilberto Sanchez, Mervi Mantsinen, Jose M. Cela
PIC methods are one of the most used methods in plasma simulations. We present a comprehensible evaluation of the PIC code performance on four current parallel platforms: IBM PowerPC, Intel Nehalem (SMP), Intel Sandy Bridge (SMP) and ARM GPU. The behavior of computational algorithms and data structures are analyzed to deduce which code optimizations will […]
View View   Download Download (PDF)   
Tobias Winchen, Marvin Gottowik, Julian Rautenberg
The Pierre Auger Observatory is the currently largest experiment dedicated to unveil the nature and origin of the highest energetic cosmic rays. The software framework ‘Offline’ has been developed by the Pierre Auger Collaboration for joint analysis of data from different independent detector systems used in one observatory. While reconstruction modules are specific to the […]
View View   Download Download (PDF)   
Shuotian Chen
Many eigenvalue and eigenvector algorithms begin with reducing the input matrix into a tridiagonal form. A tridiagonal matrix is a matrix that has non-zero elements only on its main diagonal, and the two diagonals directly adjacent to it. Reducing a matrix to a tridiagonal form is an iterative process which uses Jacobi rotations to reduce […]
View View   Download Download (PDF)   
Gloria Ortega Lopez
This thesis, entitled "High Performance Computing for solving large sparse systems. Optical Diffraction Tomography as a case of study" investigates the computational issues related to the resolution of linear systems of equations which come from the discretization of physical models described by means of Partial Differential Equations (PDEs). These physical models are conceived for the […]
Guillaume Chapuis, Hristo Djidjev
We develop an efficient parallel algorithm for answering shortest-path queries in planar graphs and implement it on a multi-node CPU/GPU clusters. The algorithm uses a divide-and-conquer approach for decomposing the input graph into small and roughly equal subgraphs and constructs a distributed data structure containing shortest distances within each of those subgraphs and between their […]
View View   Download Download (PDF)   
Sergey Zabelok, Robert Arslanbekov, Vladimir Kolobov
This paper describes recent progress towards porting a Unified Flow Solver (UFS) to heterogeneous parallel computing. UFS is an adaptive kinetic-fluid simulation tool, which combines Adaptive Mesh Refinement (AMR) with automatic cell-by-cell selection of kinetic or fluid solvers based on continuum breakdown criteria. The main challenge of porting UFS to graphics processing units (GPUs) comes […]
View View   Download Download (PDF)   
Soichiro Ikuno, Susumu Nakata, Yuta Hirokawa, Taku Itoh
High performance computing of Meshless Time Domain Method (MTDM) on multi-GPU using the supercomputer HA-PACS (Highly Accelerated Parallel Advanced system for Computational Sciences) at University of Tsukuba is investigated. Generally, the finite difference time domain (FDTD) method is adopted for the numerical simulation of the electromagnetic wave propagation phenomena. However, the numerical domain must be […]
View View   Download Download (PDF)   
Lukasz Laniewski-Wollk, Jacek Rokicki
In this paper we present a topology optimization technique applicable to a broad range of flow design problems. We propose also a discrete adjoint formulation effective for a wide class of Lattice Boltzmann Methods (LBM). This adjoint formulation is used to calculate sensitivity of the LBM solution to several type of parameters, both global and […]
View View   Download Download (PDF)   
Benjamin Hernandez, Hugo Perez, Isaac Rudomin, Sergio Ruiz, Oriam DeGyves, Leonel Toledo
We present a set of algorithms for simulating and visualizing real-time crowds in GPU (Graphics Processing Units) clusters. First we will present crowd simulation and rendering techniques that take advantage of single GPU machines, then using as an example a wandering crowd behavior simulation algorithm, we explain how this kind of algorithms can be extended […]
View View   Download Download (PDF)   
Wei Wu, Aurelien Bouteiller, George Bosilca, Mathieu Faverge, Jack Dongarra
Accelerator-enhanced computing platforms have drawn a lot of attention due to their massive peak com-putational capacity. Despite significant advances in the pro-gramming interfaces to such hybrid architectures, traditional programming paradigms struggle mapping the resulting multi-dimensional heterogeneity and the expression of algorithm parallelism, resulting in sub-optimal effective performance. Task-based programming paradigms have the capability to alleviate […]
View View   Download Download (PDF)   
Page 1 of 1212345...10...Last »

* * *

* * *

Follow us on Twitter

HGPU group

1548 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

275 people like HGPU on Facebook

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: