13907
Jonathan Jung
In this paper, we propose a new very simple numerical method for solving liquid-gas compressible flows on two dimensional cartesian meshes. For achieving high performance, the scheme is tested on recent multi-core processors and Graphics Processing Units (GPU), using the OpenCL environment. We describe how to install and to run the code CLBUBBLE for computing […]
View View   Download Download (PDF)   
Weifeng Liu, Brian Vinter
General sparse matrix-matrix multiplication (SpGEMM) is a fundamental building block for numerous applications such as algebraic multigrid method (AMG), breadth first search and shortest path problem. Compared to other sparse BLAS routines, an efficient parallel SpGEMM implementation has to handle extra irregularity from three aspects: (1) the number of nonzero entries in the resulting sparse […]
Piotr Szwed, Wojciech Chmiel
This paper presents a multi-swarm PSO algorithm for the Quadratic Assignment Problem (QAP) implemented on OpenCL platform. Our work was motivated by results of time efficiency tests performed for single-swarm algorithm implementation that showed clearly that the benefits of a parallel execution platform can be fully exploited, if the processed population is large. The described […]
View View   Download Download (PDF)   
Jack Edward Arnstein
Having learned a great deal about the problem and also the solutions over the course of this project, it is the opinion of the author that the method undertaken within this report is unsatisfactory for delivering performance enhancement over alternative approaches. Firstly the domain transfers result in reduced performance. For larger simulations these prove to […]
View View   Download Download (PDF)   
Yutong Qin, Jianbiao Lin, Xiang Huang
Ray tracing is a technique for generating an image by tracing the path of light through pixels in an image plane and simulating the effects of high-quality global illumination at a heavy computational cost. Because of the high computation complexity, it can’t reach the requirement of real-time rendering. The emergence of many-core architectures, makes it […]
View View   Download Download (PDF)   
Robert Wang
Developers have been using utility tools such as CPU-Z, GPU-Z, CUDA-Z, OpenCL-Z for a long time. These tools provide platform and hardware information in details and help developers quickly understand the hardware capabilities. Recently, OpenCL has been supported by most of the latest mobile phones/tablets, as the mobile GPUs are gaining more compute power. OpenCL-A […]
View View   Download Download (PDF)   
Narmada Naik, G.N Rathna
This paper presents a real-time face recognition system using kinect sensor. The algorithm is implemented on GPU using opencl and significant speed improvements are observed. We use kinect depth image to increase the robustness and reduce computational cost of conventional LBP based face recognition. The main objective of this paper was to perform robust, high […]
View View   Download Download (PDF)   
Wim Vanderbauwhede
In this report we present a novel approach to model coupling for shared-memory multicore systems hosting OpenCL-compliant accelerators, which we call The Glasgow Model Coupling Framework (GMCF). We discuss the implementation of a prototype of GMCF and its application to coupling the Weather Research and Forecasting Model and an OpenCL-accelerated version of the Large Eddy […]
Krzysztof Banas, Filip Kruzel, Jan Bielanski
The paper presents investigations on the implementation and performance of the finite element numerical integration algorithm for first order approximations and three processor architectures, popular in scientific computing, classical CPU, Intel Xeon Phi and NVIDIA Kepler GPU. A unifying programming model and portable OpenCL implementation is considered for all architectures. Variations of the algorithm due […]
View View   Download Download (PDF)   
Pierre L'Ecuyer, David Munger, Nabil Kemerchou
We present clRNG, a library for uniform random number generation in OpenCL. Streams of random numbers act as virtual random number generators. They can be created on the host computer in unlimited numbers, and then used either on the host or on other computing devices by work items to generate random numbers. Each stream also […]
View View   Download Download (PDF)   
Florentino Sainz
Exascale performance requires a level of energy efficiency only achievable with specialized hardware. Hence, to build a general purpose HPC system with exascale performance different types of processors, memory technologies and interconnection networks will be necessary. Heterogeneous hardware is already present on some top supercomputer systems that are composed of different compute nodes, which at […]
View View   Download Download (PDF)   
Adam Betts, Nathan Chong, Alastair F. Donaldson, Jeroen Ketema, Shaz Qadeer, Paul Thomson, John Wickerson
We present a technique for the formal verification of GPU kernels, addressing two classes of correctness properties: data races and barrier divergence. Our approach is founded on a novel formal operational semantics for GPU kernels termed synchronous, delayed visibility (SDV) semantics, which captures the execution of a GPU kernel by multiple groups of threads. The […]
Page 1 of 11212345...102030...Last »

* * *

* * *

Like us on Facebook

HGPU group

238 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1444 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: