13255
W. P. Gaudin, A. C. Mallinson, O. Perks, J. A. Herdman, D. A. Beckingsale, J. M. Levesque, M. Boulton, S. McIntosh-Smith, S. A. Jarvis
Power constraints are forcing HPC systems to continue to increase hardware concurrency. Efficiently scaling applications on future machines will be essential for improved science and it is recognised that the "flat" MPI model will start to reach its scalability limits. The optimal approach is unknown, necessitating the use of mini-applications to rapidly evaluate new approaches. […]
Jihye Kwon, Kang-Wook Kim, Sangyoun Paik, Jihwa Lee, Chang-Gun Lee
Past researches on multicore scheduling assume that a computational unit has already been parallelized into a prefixed number of threads. However, with recent technologies such as OpenCL, a computational unit can be parallelized in many different ways with runtime selectable numbers of threads. This paper proposes an optimal algorithm for parallelizing and scheduling a set […]
View View   Download Download (PDF)   
Paul Irofti
Dictionary training for sparse representations involves dealing with large chunks of data and complex algorithms that determine time consuming implementations. SBO is an iterative dictionary learning algorithm based on constructing unions of orthonormal bases via singular value decomposition, that represents each data item through a single best fit orthobase. In this paper we present a […]
View View   Download Download (PDF)   
Numair Khan
With the advent of multi and many-core processors, communication has replaced computation as the performance bottleneck. Most current approaches to the problem try to tolerate memory access latency through a high amount of Thread-Level Parallelism. However, not all applications benefit from such techniques and there is a need to address the weakness of the underlying […]
View View   Download Download (PDF)   
Wenhao Jia
In response to the ever growing demand for computing power, heterogeneous parallelism has emerged as a widespread computing paradigm in the past decade or so. In particular, massively parallel processors such as graphics processing units (GPUs) have become the prevalent throughput computing elements in heterogeneous systems, offering high performance and power efficiency for general-purpose workloads. […]
View View   Download Download (PDF)   
Sagar Venkatesh Gubbi, Chandra Sekhar Seelamantula
Image denoising is a classical problem in image processing and has applications in areas ranging from photography to medical imaging. In this paper, we examine the denoising performance of an optimized spatially-varying Gaussian filter. The parameters of the Gaussian filter are tuned by optimizing a mean squared error estimate which is similar Stein’s Unbiased Risk […]
Ville Korhonen
Heterogeneous computing has become a viable option in seeking computing performance, to the side of conventional homogeneous multi-/single-processor approaches. The advantage of heterogeneity is the possibility to choose the best device on the platform for different distinct workloads in the application to gain performance and/or to lower power consumption. The drawback of heterogeneity is the […]
View View   Download Download (PDF)   
Benjamin Long, Sue Ann Seah, Tom Carter, Sriram Subramanian
We present a method for creating three-dimensional haptic shapes in mid-air using focused ultrasound. This approach applies the principles of acoustic radiation force, whereby the non-linear effects of sound produce forces on the skin which are strong enough to generate tactile sensations. This mid-air haptic feedback eliminates the need for any attachment of actuators or […]
View View   Download Download (PDF)   
Igor Ozimek, Andrej Hrovat, Andrej Vilhar, Tomaz Javornik
Radio propagation simulation tools are important for prediction and verification of the radio signal coverage by individual transmitters or transmitter networks such as mobile phone cellular networks. In the case of a large geographic area with a relative high resolution, the simulation can become computationally demanding, taking a considerable amount of time to accomplish. Parallel […]
View View   Download Download (PDF)   
Ahmad Lashgar, Alireza Majidi, Amirali Baniasadi
In this paper we introduce IPMACC, a framework for translating OpenACC applications to CUDA or OpenCL. IPMACC is composed of set of translators translating OpenACC for C applications to CUDA or OpenCL. The framework uses the system compiler (e.g. nvcc) for generating final accelerator’s binary. The framework can be used for extending the OpenACC API, […]
View View   Download Download (PDF)   
Fan Wang, Dajiang Zhou, Satoshi Goto
This paper presents a high quality H.265/HEVC motion estimation implementation with the cooperation of CPU and GPU. The data dependency from MVP (Motion Vector Predictor) restricts the degree of parallelism on GPU. To overcome the constraint from MVP, we propose to use an estimated MVP on GPU and the accurate MVP to refine the motion […]
View View   Download Download (PDF)   
Samuel Bacha Heye
Data mining is used to extract useful information from large data. But the organizations which mine the data might not be the owner of the data. So, before the owners can make their data accessible for data mining they want to make sure that no sensitive information can be mined from the released data whose […]
View View   Download Download (PDF)   
Page 1 of 10512345...102030...Last »

* * *

* * *

Like us on Facebook

HGPU group

193 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1329 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: