Jan, 23

clpeak – peak performance of your opencl device

clpeak is a benchmarking tool intended toward developers to fine-tune opencl kernels for a particular device/class of device. It calculates bandwidth & compute performance for different vector-widths of a datatype, say float, float4. Traditionally it is recommended to use scalar code and we expect opencl compiler to auto-vectorize it. But, most of the times compiler […]
Jan, 23

A comparison between parallelization approaches in molecular dynamics simulations on GPUs

We test the performances of two different approaches to the computation of forces for molecular dynamics simulations on Graphics Processing Units. A "vertex-based" approach, where a computing thread is started per particle, is compared to a newly proposed "edge-based" approach, where a thread is started per each potentially non-zero interaction. We find that the former […]
Jan, 23

Multi-GPU parallel memetic algorithm for capacitated vehicle routing problem

The goal of this paper is to propose and test a new memetic algorithm for the capacitated vehicle routing problem in parallel computing environment. In this paper we consider simple variation of vehicle routing problem in which the only parameter is the capacity of the vehicle and each client only needs one package. We present […]
Jan, 19

A Lattice Boltzmann Method Simulator for Microfluidics on GPU Cluster

A simulator for microfluidic systems, based on lattice Boltzmann method (LBM) was developed for running on a Graphics Processing Unit (GPU) cluster. It was written on CUDA C language, implementing single component single phase fluids, and includes periodic, velocity, bounce-back and pressure boundary conditions. The program was run on a cluster with four node, where […]
Jan, 19

GPU based Implementation of Film Flicker Reduction Algorithms

In this work we propose an algorithm for film restoration aimed at reducing the flicker effect while preserving the original overall illumination of the film. We also present a comparative study of the performance of this algorithm implemented following a sequential approach on a CPU and following a parallel approach on a GPU using OpenCL.
Jan, 19

FlowTour: An Automatic Guide for Exploring Internal Flow Features

We present FlowTour, a novel framework that provides an automatic guide for exploring internal flow features. Our algorithm first identifies critical regions and extracts their skeletons for feature characterization and streamline placement. We then create candidate viewpoints based on the construction of a simplified mesh enclosing each critical region and select best viewpoints based on […]
Jan, 19

Finite-difference time-domain solver for room acoustics using graphics processing units

Several acoustic simulation methods have been introduced during the past decades. Wave-based simulation methods have been one of the alternatives, but their applicability for wideband acoustic simulation has been limited by the computing power of available hardware. During recent years, the processing power and programmability of graphics processing units have improved, and therefore several wave-based […]
Jan, 19

GPU Computing for Meshfree Particle Method

Graphics Processing Units (GPUs), originally developed for computer games, now provide computational power for scientific applications. A study on the comparison of computational speed-up and efficiency of a GPU with a CPU for the Finite Pointset Method (FPM), which is a numerical tool in Computational Fluid Dynamics (CFD) is presented. As FPM is based on […]
Jan, 18

High-performance and Embedded Systems for Cryptography

This thesis addresses the design of cryptographic accelerators, ranging from the embedded system to the high-performance computing device. New techniques are proposed to allow several cryptographic algorithms to be computed by the same target. Therefore, flexibility (to support several algorithms) and scalability (to extend the features of a designed accelerator) are two keywords in all […]
Jan, 18

Supporting x86-64 Address Translation for 100s of GPU Lanes

Efficient memory sharing between CPU and GPU threads can greatly expand the effective set of GPGPU workloads. For increased programmability, this memory should be uniformly virtualized, necessitating compatible address translation support for GPU memory references. However, even a modest GPU might need 100s of translations per cycle (6 CUs * 64 lanes/CU) with memory access […]
Jan, 18

Improving the Performance of CA-GMRES on Multicores with Multiple GPUs

The Generalized Minimum Residual (GMRES) method is one of the most widely-used iterative methods for solving nonsymmetric linear systems of equations. In recent years, techniques to avoid communication in GMRES have gained attention because in comparison to floating-point operations, communication is becoming increasingly expensive on modern computers. Since graphics processing units (GPUs) are now becoming […]
Jan, 18

A GPU-based Multi-level Subspace Decomposition Scheme for Hierarchical Tensor Product Bases

The aim of this thesis is to implement a multi-level splitting of full grids on the GPU, which could be used in the incremental visualization of scientific data sets. The splitting is motivated by the approximation properties of the sparse grid technique. Looking towards large amounts of data, ideas of parallelization and data slicing are […]
Page 30 of 704« First...1020...2829303132...405060...Last »

* * *

* * *

* * *

Free GPU computing nodes at

Registered users can now run their OpenCL application at We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 11.4
  • SDK: AMD APP SDK 2.8
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 5.0.35, AMD APP SDK 2.8

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to will be treated according to our Privacy Policy

HGPU group © 2010-2014

All rights belong to the respective authors

Contact us: