Luis-Pedro Garcia, Javier Cuenca, Domingo Gimenez
The use of auto-tuning techniques in a matrix multiplication routine for hybrid CPU+GPU platforms is analyzed. Basic models of the execution time of the hybrid routine and information obtained during its installation are used to optimize the execution time with a balanced assignation of the computation to the computing components in the heterogeneous system. Satisfactory […]
View View   Download Download (PDF)   
M.G.B. Johnson, D. P. Playne, K.A. Hawick
Floating point precision and performance and the ratio of floating point units to integer processing elements on a graphics processing unit accelerator all continue to present complex tradeoffs for optimising core utilisation on modern devices. We investigate various hybrid CPU and GPU combinations using a range of different GPU models occupying different points in this […]
View View   Download Download (PDF)   
Alexander D. Kaiser
In this thesis, I investigate computational questions in Markov chain Monte Carlo (MCMC). I am investigating one new MCMC method called the stretch move ensemble sampler [3]. I have looked at the performance of this algorithm, in terms of acceptance rates, autocorrelation time and compute performance. The thesis describes a parallel implementation of the algorithm […]
View View   Download Download (PDF)   
Xun Jia, Peter Ziegenhein, Steve B Jiang
Recent developments in radiotherapy therapy demand high computation powers to solve challenging problems in a timely fashion in a clinical environment. The graphics processing unit (GPU), as an emerging high-performance computing platform, has been introduced to radiotherapy. It is particularly attractive due to its high computational power, small size, and low cost for facility deployment […]
View View   Download Download (PDF)   
Mohammadhossein Afrasiabi
This thesis explores the possibility of utilizing Graphics Processing Units (GPUs) to address the computational demand of algorithms used to mitigate the inherent physical limitations in devices such as microscopes and 3D-scanners. We investigate the outcome and test our methodology for the following case studies: – the narrow field of view found in microscopes. – […]
View View   Download Download (PDF)   
Frederick R.M. Barnes, Thomas Pressnell, Brendan Le Foll
This paper reports on our experiences of using commodity GPUs to speed-up the execution of fine-grained concurrent simulations. Starting with an existing process-oriented ‘boids’ simulation, we explore a variety of techniques aimed at improving performance, gradually refactoring the original code. Successive improvements lead to a 10-fold improvement in performance, which we believe can still be […]
Jonathan Passerat-Palmbach
The race to computing power increases every day in the simulation community. A few years ago, scientists have started to harness the computing power of Graphics Processing Units (GPUs) to parallelize their simulations. As with any parallel architecture, not only the simulation model implementation has to be ported to the new parallel platform, but all […]
View View   Download Download (PDF)   
Romain Maffina
The goal of the project is to develop a triangle-triangle collision algorithm. A reference triangle is given as well as a variably-sized array of many other triangles. The algorithm must check if one triangle intersects with the reference triangle. That operation has to be led for each "non-reference" triangle with the reference triangle. If one […]
View View   Download Download (PDF)   
Jose M. P. Nascimento, Jose M. Bioucas-Dias, Jose M. Rodriguez Alves, Vitor Silva, Antonio Plaza
This letter presents a new parallel method for hyperspectral unmixing composed by the efficient combination of two popular methods: vertex component analysis (VCA) and sparse unmixing by variable splitting and augmented Lagrangian (SUNSAL). First, VCA extracts the end-member signatures, and then, SUNSAL is used to estimate the abundance fractions. Both techniques are highly parallelizable, which […]
View View   Download Download (PDF)   
Davide Montanari, Enrica Scolari, Chiara Silvestri, Yan J. Graves, Hao Yan, Laura Cervino, Roger Rice, Steve B. Jiang, Xun Jia
Cone beam CT (CBCT) has been widely used for patient setup in image guided radiation therapy (IGRT). Radiation dose from CBCT scans has become a clinical concern. The purposes of this study are 1) to commission a GPU-based Monte Carlo (MC) dose calculation package gCTD for Varian On-Board Imaging (OBI) system and test the calculation […]
View View   Download Download (PDF)   
Gabor Jakab, Laszlo Szirmay-Kalos
Developing image reconstruction algorithms for diagnostic medical devices requires physically accurate and effective simulation tools. In this paper we present a hybrid Monte Carlo (MC) particle simulation method for Computed Tomography (CT) scanners. To meet the performance requirements, we combine several variance reduction techniques and tailor the algorithms for effective GPU execution. Variance reduction methods […]
View View   Download Download (PDF)   
John T. O'Donnell, Cordelia V. Hall
Digital circuit simulation often requires a large amount of computation, resulting in long run times. We consider several techniques for optimising a brute force synchronous circuit simulator: an algorithm using an event queue that avoids recalculating quiescent parts of the circuit, a marking algorithm that is similar to the event queue but that avoids a […]
View View   Download Download (PDF)   
Page 1 of 512345

* * *

* * *

* * *

Free GPU computing nodes at

Registered users can now run their OpenCL application at We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 11.4
  • SDK: AMD APP SDK 2.8
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 5.0.35, AMD APP SDK 2.8

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to will be treated according to our Privacy Policy

HGPU group © 2010-2014

All rights belong to the respective authors

Contact us: