Min Feng, Rajiv Gupta, Laxmi N. Bhuyan
This paper overviews the first speculative parallelization technique for GPUs that can exploit parallelism in loops even in the presence of dynamic irregularities that may give rise to cross-iteration dependences. The execution of a speculatively parallelized loop consists of five phases: scheduling, computation, misspeculation check, result committing, and misspeculation recovery. We perform misspeculation check on […]
View View   Download Download (PDF)   
Nitin Singhal, Jin Woo Yoo, Ho Yeol Choi, In Kyu Park
The advent of GPUs with programmable shaders on mobile phones has motivated developers to utilize GPU to offload computationally intensive tasks and relive the burden of embedded CPU. In this paper, we present a set of metrics to measure characteristics of a mobile phone GPU with the focus on image processing algorithms. These measures assist […]
View View   Download Download (PDF)   
Kenneth Moreland, Utkarsh Ayachit, Berk Geveci, Kwan-Liu Ma
Experts agree that the exascale machine will comprise processors that contain many cores, which in turn will necessitate a much higher degree of concurrency. Software will require a minimum of a 1,000 times more concurrency. Most parallel analysis and visualization algorithms today work by partitioning data and running mostly serial algorithms concurrently on each data […]
View View   Download Download (PDF)   
Thomas Nowotny
Simulating large scale computer models of brain structures with spiking neuronal networks has become increasingly popular and feasible with the advent of general purpose computing on graphical processing units (GPGPU). Modern graphics cards, such as the NVidia range supporting the common unified device architecture (CUDA) provide massively parallel computing architectures for this purpose. Earlier GPU […]
View View   Download Download (PDF)   
Alecio P.D. Binotto, Carlos E. Pereira, Dieter W. Fellner
High-performance platforms are required by applications that use massive calculations. Actually, desktop accelerators (like the GPUs) form a powerful heterogeneous platform in conjunction with multi-core CPUs. To improve application performance on these hybrid platforms, load-balancing plays an important role to distribute workload. However, such scheduling problem faces challenges since the cost of a task at […]
View View   Download Download (PDF)   
Michael Knaup, Sven Steckmann, Olivier Bockenbach, and Marc Kachelriess
In the transversal plane CT exhibits a nearly rotational symmetric point spread function. Pixel sampling is typically done on Cartesian grids which are not ideal from a signal processing point of view. It is advantageous to use a hexagonal grid which can capture the same signal components with 13% fewer sampling points. In 3D one […]
View View   Download Download (PDF)   
Kentaro Sano, Yoshiaki Hatsuda, Satoru Yamamoto
Stencil computation is one of the important kernels in scientific computations, however, the sustained performance is limited by memory bandwidth especially on multi-core microprocessors and GPGPUs due to its small operationalintensity. In this paper, we propose a scalable streaming-array (SSA) of simple soft-processors for high-performance stencil computation on multiple FPGAs. The SSA architecture allows a […]
View View   Download Download (PDF)   
Holger Scherl, Benjamin Keck, Markus Kowarschik, Joachim Hornegger
The Common Unified Device Architecture (CUDA) is a fundamentally new programming approach making use of the unified shader design of the most current Graphics Processing Units (CPUs) from NVIDIA. The programming interface allows to implement an algorithm using standard C language and a few extensions without any knowledge about graphics programming using OpenGL, DirectX, and […]
View View   Download Download (PDF)   
Suren Chilingaryan, Alessandro Mirone, Andrew Hammersley,Claudio Ferrero, Lukas Helfen, Andreas Kopmann, Tomy dos Santos Rolo
Current imaging experiments at synchrotron beam lines often lack a real-time data assessment. X-ray imaging cameras installed at synchrotron facilities like ANKA provide millions of pixels, each with a resolution of 12 bits or more, and take up to several thousand frames per second. A given experiment can produce data sets of multiple gigabytes in […]
View View   Download Download (PDF)   
Edward W. Lowe Jr., Nils Woetzel, Jens Meiler
Three initial fits of 1ubi in a 6.6A resolution synthesized density map had backbone RMSDs to the correct placement of 2.7, 2.9 and 6.6A. They have been refined with a Powell optimizer [5] in 10 iterations using 6 directions, 3 rotations a, beta with 0.15 radians and gamma with 0.075 radians starting direction to cover […]
Edward W. Lowe Jr., Nils Woetzel, Jens Meiler
Here, we present a GPU-accelerated OpenCL implementation of a back-propagation artificial neural network for the creation of QSAR models for drug discovery and virtual high-throughput screening. A QSAR model for HSD achieved an enrichment of 5.9 and area under the curve of 0.83 on an independent data set which signifies sufficient predictive ability for virtual […]
View View   Download Download (PDF)   
Dzmitry Razmyslovich, Guillermo Marcus, Markus Gipp, Marc Zapatka, Andreas Szillus
In this paper we present an implementation of the Smith-Waterman algorithm. The implementation is done in OpenCL and targets high-end GPUs. This implementation is capable of computing similarity indexes between reference and query sequences. The implementation is designed for the sequence alignment paths calculation. In addition, it is capable of handling very long reference sequences […]
View View   Download Download (PDF)   
Page 1 of 212

* * *

* * *

Follow us on Twitter

HGPU group

1666 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

338 people like HGPU on Facebook

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: