Adrien Remy, Marc Baboulin, Masha Sosonkina, Brigitte Rozoy
We study the impact of non-uniform memory accesses (NUMA) on the solution of dense general linear systems using an LU factorization algorithm. In particular we illustrate how an appropriate placement of the threads and memory on a NUMA architecture can improve the performance of the panel factorization and consequently accelerate the global LU factorization. We […]
View View   Download Download (PDF)   
Angeles Navarro, Antonio Vilches, Francisco Corbera, Rafael Asenjo
This paper explores the possibility of efficiently executing a single application using multicores simultaneously with multiple GPU accelerators under a parallel task programming paradigm. In particular, we address the challenge of extending a parallel for template to allow its exploitation on heterogeneous architectures. Previous task frameworks that offer support for heterogeneous systems implement a variety […]
View View   Download Download (PDF)   
Michal Karpinski, Maciej Pacut
The goal of this paper is to propose and test a new memetic algorithm for the capacitated vehicle routing problem in parallel computing environment. In this paper we consider simple variation of vehicle routing problem in which the only parameter is the capacity of the vehicle and each client only needs one package. We present […]
View View   Download Download (PDF)   
Vedran Novakovic
We present a hierarchically blocked one-sided Jacobi algorithm for the singular value decomposition (SVD), targeting both single and multiple graphics processing units (GPUs). The blocking structure reflects the levels of GPU’s memory hierarchy. The algorithm may outperform MAGMA’s dgesvd, while retaining high relative accuracy. To this end, we developed a family of parallel pivot strategies […]
View View   Download Download (PDF)   
Giuseppe Palma, Francesco Piccialli, Pasquale De Michele, Salvatore Cuomo, Marco Comerci, Pasquale Borrelli, Bruno Alfano
Non-Local Means (NLM) algorithm is widely considered as a state-of-the-art denoising filter in many research fields. High computational complexity led to implementations on Graphic Processor Unit (GPU) architectures, which achieve reasonable running times by filtering, slice-by-slice, 3D datasets with a 2D NLM approach. Here we present a fully 3D NLM implementation on a multi-GPU architecture […]
View View   Download Download (PDF)   
Jeroen Vonk
Computing general problems using the graphical processing unit (GPU) of a device is an emerging field. The parallel structure of the GPU allows for massive concurrency, when executing a program. Therefore, by executing (a part of) the code on the GPU, a previously unused resource can be used, to achieve a speed-up of an application. […]
View View   Download Download (PDF)   
Jakub Kurzak, Piotr Luszczek, Mathieu Faverge, Jack Dongarra
LU factorization with partial pivoting is a canonical numerical procedure and the main component of the high performance LINPACK benchmark. This paper presents an implementation of the algorithm for a hybrid, shared memory, system with standard CPU cores and GPU accelerators. The difficulty of implementing the algorithm for such a system lies in the disproportion […]
View View   Download Download (PDF)   
R. Farina, S. Cuomo, P. De Michele, F. Piccialli
In this paper, the preconditioning technique of an elliptic Laplace problem in a global circulation ocean model is analyzed. We suggest an inverse preconditioning technique in order to efficiently compute the numerical solution of the elliptic kernel. Moreover, we show how the convergence rate and the performance of the solver are strictly linked to the […]
View View   Download Download (PDF)   
Sai Kiran Korwar
In this paper, we discuss the acceleration of a climate model known as Community Earth System Model (CESM). The use of Graphics Processor Units (GPUs) to accelerate scientific applications that are computationally intensive is well known. This project attempts to extract the performance of GPUs to enable fast execution of CESM to obtain better model […]
View View   Download Download (PDF)   
N. Kulabukhova
High power accelerator facilities lead to necessity to consider space charge forces. It is therefore important to study the space charge dynamics in the corresponding channels. To represent the space charge forces of the beam we have developed special software based on some analytical models for space charge distributions. Because calculations for space charge dynamics […]
View View   Download Download (PDF)   
Lawrence M. Murray
LibBi is a software package for state-space modelling and Bayesian inference on modern computer hardware, including multi-core central processing units (CPUs), many-core graphics processing units (GPUs) and distributed-memory clusters of such devices. The software parses a domain-specific language for model specification, then optimises, generates, compiles and runs code for the given model, inference method and […]
Jonathan Allen Berkhahn
Cloud computing setups are a huge investment of resources and personnel to maintain. As the workload on a system is a major contributing factor to both the performance of the system and a representation of the needs of system users, a clear understanding of the workload is critical to organizations that support supercomputing systems. In […]
View View   Download Download (PDF)   
Page 1 of 41234

* * *

* * *

* * *

Free GPU computing nodes at

Registered users can now run their OpenCL application at We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 11.4
  • SDK: AMD APP SDK 2.8
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 5.0.35, AMD APP SDK 2.8

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to will be treated according to our Privacy Policy

HGPU group © 2010-2014

All rights belong to the respective authors

Contact us: