12579
Peng Di
This thesis attempts to design and implement a compiler framework based on the polyhedral model. The compiler automatically parallelizes loop nests; especially stencil kernels, into efficient GPU code by loop tiling transformations which the polyhedral model describes. To enhance parallel performance, we introduce three practically efficient techniques to process different types of loop nests. The […]
View View   Download Download (PDF)   
Ke Ding, Ying Tan
Benchmarking is key for developing and comparing optimization algorithms. In this paper, a CUDA-based real parameter optimization benchmark (cuROB) is introduced. Test functions of diverse properties are included within cuROB and implemented efficiently with CUDA. Speedup of one order of magnitude can be achieved in comparison with CPU-based benchmark of CEC’14.
Bryan Ching
Lossless data compression is used to reduce storage requirements, allowing for the relief of I/O channels and better utilization of bandwidth. The Lempel-Ziv lossless compression algorithms form the basis for many of the most commonly used compression schemes. General purpose computing on graphic processing units (GPGPUs) allows us to take advantage of the massively parallel […]
View View   Download Download (PDF)   
Ichitaro Yamazaki, Stanimire Tomov, Tingxing Dong, Jack Dongarra
We propose a mixed-precision orthogonalization scheme that takes the input matrix in a standard 32 or 64-bit floating-point precision, but uses higher-precision arithmetics to accumulate its intermediate results. For the 64-bit precision, our scheme uses software emulation for the higher-precision arithmetics, and requires about 20x more computation but about the same amount of communication as […]
View View   Download Download (PDF)   
Chaitanya P. Chandgude, P. M. Chawan
Synthetic aperture radar (SAR) Ship Detection System SDS is an important application from the point of view of Maritime Security monitoring. It allows monitoring traffic, fisheries, naval warfare. Since full-resolution SAR images are heavily affected by the presence of speckle, ship detection algorithms generally employ speckle reduced SAR images at the expense of a degradation […]
View View   Download Download (PDF)   
Yan Zhou
The sequential Monte Carlo (smc) methods have been widely used for modern scientific computation. Bayesian model comparison has been successfully applied in many fields. Yet there have been few researches on the use of smc for the purpose of Bayesian model comparison. This thesis studies different smc strategies for Bayesian model computation. In addition, various […]
Win-Tsung Lo, Yue-Shan Chang, Ruey-Kai Sheu, Chun-Chieh Chiu, Shyan-Ming Yuan
Decision tree is one of the famous classification methods in data mining. Many researches have been proposed, which were focusing on improving the performance of decision tree. However, those algorithms are developed and run on traditional distributed systems. Obviously the latency could not be improved while processing huge data generated by ubiquitous sensing node in […]
View View   Download Download (PDF)   
Valentina Popescu
Many numerical problems require higher precision than the conventional floating-point (single, double) formats. One solution is to use multiple precision libraries such as GNU MPFR, which allow the manipulation of very high precision numbers. But their generality (they are able to handle numbers with millions of digits), is a quite heavy alternative when medium precision […]
View View   Download Download (PDF)   
S.A. Arul Shalom, Manoranjan Dash
Graphics Processing Units (GPU) in today’s desktops can well be thought of as a high performance parallel processor. Traditionally, parallel computing is the usage of multiple computing resources to execute computational problems simultaneously. Such computations are possible using multi-core CPUs or computers with multiple CPUs or by using a network of computers in parallel. Today’s […]
View View   Download Download (PDF)   
Tor Gillberg, Are Magnus Bruaset, Oyvind Hjelle, Mohammed Sourouri
Two new algorithms for numerical solution of static Hamilton-Jacobi equations are presented. These algorithms are designed to work efficiently on different parallel computing architectures, and numerical results for multicore CPU and GPU implementations are reported and discussed. The numerical experiments show that the proposed solution strategies scale well with the computational power of the hardware. […]
View View   Download Download (PDF)   
Kannappan Palaniappan
We have implemented, tested, validated and benchmarked a scalable parallel implementations of the integral histogram algorithm critical for computer vision tasks for fast multiscale subwindow-based object searching, motion analysis and content-based image retrieval applications. Several integral histogram kernels using CUDA optimizations for many core GPUs were investigated. The integral histogram algorithm was also parallelized using […]
View View   Download Download (PDF)   
Istvan Zoltan Reguly
The last decade saw the long tradition of frequency scaling of processing units grind to a halt, and efforts were re-focused on maintaining computational growth by other means; such as increased parallelism, deep memory hierarchies and complex execution logic. After a long period of "boring productivity", a host of new architectures, accelerators, programming languages and […]
View View   Download Download (PDF)   
Page 1 of 23412345...102030...Last »

* * *

* * *

Like us on Facebook

HGPU group

128 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1193 peoples are following HGPU @twitter

Featured events

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: