12959
Bruce Merry
Sorting and scanning are two fundamental primitives for constructing highly parallel algorithms. A number of libraries now provide implementations of these primitives for GPUs, but there is relatively little information about the performance of these implementations. We benchmark seven libraries for 32-bit integer scan and sort, and sorting 32-bit values by 32-bit integer keys.We show […]
Fabio Miguel Cardoso Soldado
The Graphics Processing Unit (GPU) is present in almost every modern day personal computer. Despite its specific purpose design, they have been increasingly used for general computations with very good results. Hence, there is a growing effort from the community to seamlessly integrate this kind of devices in everyday computing. However, to fully exploit the […]
View View   Download Download (PDF)   
Kato Mivule, Benjamin Harvey, Crystal Cobb, Hoda El Sayed
The advent of high performance computing (HPC) and graphics processing units (GPU), present an enormous computation resource for Large data transactions (big data) that require parallel processing for robust and prompt data analysis. While a number of HPC frameworks have been proposed, parallel programming models present a number of challenges, for instance, how to fully […]
View View   Download Download (PDF)   
Amrit Panda
Stream processing has emerged as an important model of computation especially in the context of multimedia and communication sub-systems of embedded System-on-Chip (SoC) architectures. The dataflow nature of streaming applications allows them to be most naturally expressed as a set of kernels iteratively operating on continuous streams of data. The kernels are computationally intensive and […]
View View   Download Download (PDF)   
Xiangang Li, Xihong Wu
Long short-term memory (LSTM) based acoustic modeling methods have recently been shown to give state-of-the-art performance on some speech recognition tasks. To achieve a further performance improvement, in this research, deep extensions on LSTM are investigated considering that deep hierarchical model has turned out to be more efficient than a shallow one. Motivated by previous […]
View View   Download Download (PDF)   
Steven Gurfinkel
Many computer systems now include both CPUs and programmable GPUs. OpenCL, a new programming framework, can program individual CPUs or GPUs; however, distributing a problem across multiple devices is more difficult. This thesis contributes three OpenCL runtimes that automatically distribute a problem across multiple devices: DualCL and m2sOpenCL, which distribute tasks across a single system’s […]
View View   Download Download (PDF)   
Mehmet Ufuk Buyuksahin
Galois Field arithmetic has been used very frequently in popular security and error-correction applications. Montgomery multiplication is among the suitable methods used for accelerating modular multiplication, which is the most time consuming basic arithmetic operation. Montgomery multiplication is also suitable to be implemented in parallel. OpenCL, which is a portable, heterogeneous and parallel programming framework, […]
View View   Download Download (PDF)   
Karl Rupp, Josef Weinbub, Ansgar Jungel, Tibor Grasser
We revisit the implementation of iterative solvers on discrete graphics processing units and demonstrate the benefit of implementations using extensive kernel fusion for pipelined formulations over conventional implementations of classical formulations. The proposed implementations with both CUDA and OpenCL are freely available in ViennaCL and achieve up to three-fold performance gains when compared to other […]
James A. Ross, David A. Richie, Song J. Park, Dale R. Shires, Lori L. Pollock
An observation in supercomputing in the past decade illustrates the transition of pervasive commodity products being integrated with the world’s fastest system. Given today’s exploding popularity of mobile devices, we investigate the possibilities for high performance mobile computing. Because parallel processing on mobile devices will be the key element in developing a mobile and computationally […]
View View   Download Download (PDF)   
Antonio J. Pena, Carlos Reano, Federico Silla, Rafael Mayo, Enrique S. Quintana-Orti, Jose Duato
In this paper we detail the key features, architectural design, and implementation of rCUDA, an advanced framework to enable remote and transparent GPGPU acceleration in HPC clusters. rCUDA allows decoupling GPUs from nodes, forming pools of shared accelerators, which brings enhanced flexibility to cluster configurations. This opens the door to configurations with fewer accelerators than […]
View View   Download Download (PDF)   
Koji Nakano, Susumu Matsumae, Yasuaki Ito
The Discrete Memory Machine (DMM) is a theoretical parallel computing model that captures the essence of memory access to the shared memory of a streaming multiprocessor on CUDA-enabled GPUs. The DMM has w memory banks that constitute a shared memory, and w threads in a warp try to access them at the same time. However, […]
View View   Download Download (PDF)   
Hisham Mohamed
This thesis studies the scalability of the similarity search problem in large-scale multidimensional data. Similarity search, translating into the neighbour search problem, finds many applications for information retrieval, visualization, machine learning and data mining. The current exponential growth of data motivates the need for approximate and scalable algorithms. In most of existing algorithms and data-structures, […]
View View   Download Download (PDF)   
Page 1 of 48312345...102030...Last »

* * *

* * *

Like us on Facebook

HGPU group

167 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1273 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: