Feb, 17

A Similarity-Based Analysis Tool for Scientific Application Porting

Porting applications to a new system is a nontrivial job in the HPC field. It is a very time-consuming, labor-intensive process, and the quality of the results will depend critically on the experience of the experts involved. In order to ease the porting process, we propose a methodology to address an important aspect of software […]
Feb, 17

The battle of the giants: a case study of GPU vs FPGA optimisation for real-time image processing

This paper focuses on a thorough comparison of the two main hardware targets for real-time optimization of a computer vision algorithm: GPU and FPGA. Based on a complex case study algorithm for threaded isle detection, implementation on both hardware targets is compared in terms of resulting time performance, code translation effort, hardware cost, power efficiency […]
Feb, 17

Petascale elliptic solvers for anisotropic PDEs on GPU clusters

Memory bound applications such as solvers for large sparse systems of equations remain a challenge for GPUs. Fast solvers should be based on numerically efficient algorithms and implemented such that global memory access is minimised. To solve systems with up to one trillion (10^12) unknowns the code has to make efficient use of several million […]
Feb, 17

GPU Programming with CUDA: A brief overview

In this paper we describe the architecture of a NVIDIA GPU, as well as the CUDA programming model. The basic statements are explained. We also provide an example of CUDA code, explaining its execution workflow in a GPU device.
Feb, 17

Optimizing Performance of Stencil Code with SPL Conqueror

A standard technique to numerically solve elliptic partial differential equations on structured grids is to discretize them via finite differences and then to apply an efficient geometric multi-grid solver. Unfortunately, finding the optimal choice of multi-grid components and parameters is challenging and platform dependent, especially, in cases where domain knowledge is incomplete. Auto-tuning is a […]
Feb, 17

Interactive Design Exploration for Constrained Meshes

In architectural design, surface shapes are commonly subject to geometric constraints imposed by material, fabrication or assembly. Rationalization algorithms can convert a freeform design into a form feasible for production, but often require design modifications that might not comply with the design intent. In addition, they only offer limited support for exploring alternative feasible shapes, […]
Feb, 17

Efficient pseudo-random number generation for monte-carlo simulations using graphic processors

A hybrid approach based on the combination of three Tausworthe generators and one linear congruential generator for pseudo random number generation for GPU programing as suggested in NVIDIA-CUDA library has been used for MONTE-CARLO sampling. On each GPU thread, a random seed is generated on fly in a simple way using the quick and dirty […]
Feb, 17

Resolution of Linear Algebra for the Discrete Logarithm Problem using GPU and Multi-core Architectures

In cryptanalysis, solving the discrete logarithm problem (DLP) is key to assessing the security of many public-key cryptosystems. The index-calculus methods, that attack the DLP in multiplicative subgroups of finite fields, require solving large sparse systems of linear equations modulo large primes. This article deals with how we can run this computation on GPU- and […]
Feb, 17

Fast American Basket Option Pricing on a multi-GPU Cluster

This article presents a multi-GPU adaptation of a specific Monte Carlo and classification based method for pricing American basket options, due to Picazo. The first part relates how to combine fine and coarse-grained parallelization to price American basket options. A dynamic strategy of kernel calibration is proposed. Doing so, our implementation on a reasonable size […]
Feb, 16

Towards Porting a Real-World Seismological Application to the Intel MIC Architecture

This whitepaper aims to discuss first experiences with porting an MPI-based real-world geophysical application to the new Intel Many Integrated Core (MIC) architecture. The selected code SeisSol is an application written in Fortran that can be used to simulate earthquake rupture and radiating seismic wave propagation in complex 3-D heterogeneous materials. The PRACE prototype cluster […]
Feb, 16

Direct Numerical Simulation and Large Eddy Simulation on a Turbulent Wall-Bounded Flow Using Lattice Boltzmann Method and Multiple GPUs

Direct numerical simulation (DNS) and large eddy simulation (LES) were performed on the wall-bounded flow at Re_tau = 180 using lattice Boltzmann method (LBM) and multiple Graphic Processing Units (GPUs). In the DNS, 8 K20M GPUs were adopted. The maximum number of meshes is 6.7×10^7, which results in the non-dimensional mesh size of Delta+=1.41 for […]
Feb, 16

Cuda K-Nn: application to the segmentation of the retinal vasculature within SD-OCT volumes of mice

In this work, a speed comparison between GPU-based CUDA k-NN implementation and the ANN implementation has been tested on three sets of medical imaging data. The results show that with higher dimensional data, CUDA-based k-NN approach could have up to two orders of magnitude of speed up. Otherwise, ANN would be a better implementation to […]
Page 20 of 703« First...10...1819202122...304050...Last »

* * *

* * *

* * *

Free GPU computing nodes at

Registered users can now run their OpenCL application at We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 11.4
  • SDK: AMD APP SDK 2.8
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 5.0.35, AMD APP SDK 2.8

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to will be treated according to our Privacy Policy

HGPU group © 2010-2014

All rights belong to the respective authors

Contact us: