Apr, 9

Automated GPU Kernel Transformations in Large-Scale Production Stencil Applications

This paper proposes an end-to-end framework for automatically transforming stencil-based CUDA programs to exploit inter-kernel data locality. The CUDA-to-CUDA transformation collectively replaces the user-written kernels by auto-generated kernels optimized for data reuse. The transformation is based on two basic operations, kernel fusion and fission, and relies on a series of automated steps: gathering metadata, generating […]
Apr, 8

Finite element numerical integration for first order approximations on multi-core architectures

The paper presents investigations on the implementation and performance of the finite element numerical integration algorithm for first order approximations and three processor architectures, popular in scientific computing, classical CPU, Intel Xeon Phi and NVIDIA Kepler GPU. A unifying programming model and portable OpenCL implementation is considered for all architectures. Variations of the algorithm due […]
Apr, 8

GPU Accelerated Strong and Branching Bisimilarity Checking

Bisimilarity checking is an important operation to perform explicit-state model checking when the state space of a model under verification has already been generated. It can be applied in various ways: reduction of a state space w.r.t. a particular flavour of bisimilarity, or checking that two given state spaces are bisimilar. Bisimilarity checking is a […]
Apr, 8

Enhancing Fluid Modeling with Turbulence and Acceleration

In this dissertation, we have proposed our solutions to four important and challenging topics in enhancing fluid modeling with turbulence and acceleration: distance field representation of obstacles in fluid, adaptive and controllable turbulence enhancement, Langevin Particles and GPU acceleration in fluid modeling. All these fields aims at creating realistic and fast fluid field which are […]
Apr, 8

Benchmarking the cost of thread divergence in CUDA

All modern processors include a set of vector instructions. While this gives a tremendous boost to the performance, it requires a vectorized code that can take advantage of such instructions. As an ideal vectorization is hard to achieve in practice, one has to decide when different instructions may be applied to different elements of the […]
Apr, 8

Early Experiences Running the 3D Stencil Jacobi Method in Intel Xeon Phi

Iterative stencil computations are important pattern of computations in different computational fields such as physics or chemistry simulations. A stencil computation repeatedly updates each point of a d-dimensional grid as a function of itself and its near neighbors. As the demand for more and more compute power is growing rapidly in different fields of research, […]
Apr, 8

3rd International Conference on Control, Robotics and Cybernetics (ICCRC), 2015

ICCRC 2015- 3rd International Conference on Control, Robotics and Cybernetics Berlin, Germany August 13-14, 2015 http://www.iccrc.org/ Submission Deadline: 2015-06-05 Publication: All accepted papers of ICCRC 2015 will be published by International Journal of Mechanical Engineering and Robotics Research (IJMERR) (ISSN:2278-0149), which will be indexed by Copernicus, ProQuest (USA), Open J-Gate, Indian Science and Google Scholar. […]
Apr, 8

8th International Conference on Advanced Computer Theory and Engineering (ICACTE), 2015

ICACTE 2015- 8th International Conference on Advanced Computer Theory and Engineering Berlin, Germany August 13-14, 2015 http://www.icacte.org/ Submission Deadline: 2015-06-05 Publication: Submitted papers can be selected and published into one of the following Journals. *WIT Transactions on Engineering Sciences (ISSN: 1743-3533) Indexed by EI Compendex and ISI * International Journal of Computer Theory and Engineering […]
Apr, 8

7th International Conference on Education Technology and Computer (ICETC), 2015

ICETC 2015- 7th International Conference on Education Technology and Computer Berlin, Germany August 13-14, 2015 http://www.icetc.org/ Submission Deadline: 2015-06-05 Publication: *International Journal of Information and Education Technology (IJIET)-ISSN: 2010-3689 Abstracting/ Indexing: EI (INSPEC, IET), Cabell’s Directories, DOAJ, Electronic Journals Library, Engineering & Technology Digital Library, EBSCO, Google Scholar, Crossref and ProQuest *Lecture Notes on Information […]
Apr, 7

clRNG: A Random Number API with Multiple Streams for OpenCL

We present clRNG, a library for uniform random number generation in OpenCL. Streams of random numbers act as virtual random number generators. They can be created on the host computer in unlimited numbers, and then used either on the host or on other computing devices by work items to generate random numbers. Each stream also […]
Apr, 7

State Lattice-based Motion Planning for Autonomous On-Road Driving

Since DARPA Urban Challenge 2007 (DUC), the development of autonomous vehicles has attracted increasing attention from both academic institutes and the automotive industry. It is believed that autonomous vehicles sophisticated and reliable enough would redefine mobility. The motion planner and sensor simulation presented in this thesis are intended to contribute to this prospect. The task […]
Apr, 7

Multi-Lingual Speech Recognition with Low-Rank Multi-Task Deep Neural Networks

Multi-task learning (MTL) for deep neural network (DNN) multilingual acoustic models has been shown to be effective for learning parameters that are common or shared between multiple languages[1, 2]. In the MTL paradigm, the number of parameters in the output layer is large and scales with the number of languages used in training. This output […]
Page 4 of 798« First...23456...102030...Last »

* * *

* * *

Like us on Facebook

HGPU group

238 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1444 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: