Stephen Tyree, Jacob R. Gardner, Kilian Q. Weinberger, Kunal Agrawal, John Tran
In this paper, we evaluate the performance of various parallel optimization methods for Kernel Support Vector Machines on multicore CPUs and GPUs. In particular, we provide the first comparison of algorithms with explicit and implicit parallelization. Most existing parallel implementations for multi-core or GPU architectures are based on explicit parallelization of Sequential Minimal Optimization (SMO) […]
A. Jain
We study fluid flow in a 2D lid driven cavity for large Reynolds numbers using multirelaxation time – Lattice Boltzmann Method(LBM). LBM is an alternative to conventional CFD methods that solve Navier-Stokes equations to simulate incompressible fluid dynamics. In LBM, one solves the linearized Boltzmann equation on a discrete lattice to study spatio-temporal evolution of […]
View View   Download Download (PDF)   
A. Joshi, M. Frank
For manufacturing a part using conventional 3-Axis CNC machining process, one must determine a set of machining orientations. Generally this process planning task is carried out manually by the machinist, considering decision parameters such as part visibility, machinability, machining depths, tool geometry, etc. In this work, we modelled this as a Linear optimization problem; the […]
View View   Download Download (PDF)   
Liliana Ibeth Barbosa-Santillan, Inmaculada Alvarez-de-Mon y-Rego
This paper presents an approach to create what we have called a Unified Sentiment Lexicon (USL). This approach aims at aligning, unifying, and expanding the set of sentiment lexicons which are available on the web in order to increase their robustness of coverage. One problem related to the task of the automatic unification of different […]
View View   Download Download (PDF)   
Julio Camarero Mateo
In this Master Thesis it has been established the specifications for developing a face recognition system in a variety of platforms at the same time: MATLAB running in a personal computer, C code in an embedded microprocessor (MicroBlaze), a simpler reconfigurable hardware for an FPGA-based platform, a flexible hardware for higher performance, and finally a […]
View View   Download Download (PDF)   
Ryan R. Newton, Eric Holk, Trevor L. McDonell
High-level domain-specific-languages for array processing on the GPU are increasingly common, but to date they run only on a single GPU. We argue that languages will need to target multiple devices, even simultaneous combinations of GPU/GPU and CPU/GPU. Increased flexibility may be key to making these languages more easily deployable and thus widespread. To this […]
Marc Baboulin, Jack Dongarra, Remi Lacroix
This paper presents an efficient computation for least squares conditioning or estimates of it. We propose performance results using new routines on top of the multicore-GPU library MAGMA. This set of routines is based on an efficient computation of the variance-covariance matrix for which, to our knowledge, there is no implementation in current public domain […]
View View   Download Download (PDF)   
Carlo del Mundo, Wu-chun Feng
The fast Fourier transform (FFT), a spectral method that computes the discrete Fourier transform and its inverse, pervades many applications in digital signal processing, such as imaging, tomography, and software-defined radio. Its importance has caused the research community to expend significant resources to accelerate the FFT, of which FFTW is the most prominent example. With […]
View View   Download Download (PDF)   
Juan Ignacio Perez, Eliseo Garcia, Jose A. de Frutos, Felipe Catedra
The Characteristic Basis Function Method (CBFM) is a popular technique for efficiently solving the Method of Moments (MoM) matrix equations. In this work, we address the adaptation of this method to a relatively new computing infrastructure provided by NVIDIA, the Compute Unified Device Architecture (CUDA), and take into account some of the limitations which appear […]
View View   Download Download (PDF)   
Abu Asaduzzaman, Anindya Maiti, Chok M. Yip
There are great interests in understanding the manner by which the prime numbers are distributed throughout the integers. Prime numbers are being used in secret codes for more than 60 years now. Computer security authorities use extremely large prime numbers when they devise cryptographs, like RSA (short for Rivest, Shamir, and Adleman) algorithm, for protecting […]
View View   Download Download (PDF)   
E. Garcia, T. Hausotte
The Bayesian theorem is the most used instrument for stochastic inferencing in nonlinear dynamic systems and also the fundament of measurement uncertainty evaluation in the GUM. Many powerful algorithms have been derived and applied to numerous problems. The most widely used algorithms are the broad family of Kalman filters (KFs), the grid-based filters and the […]
Adnan Ozsoy, Arun Chauhan, Martin Swany
In this paper, we describe a novel technique to optimize longest common subsequence (LCS) algorithm for one-to-many matching problem on GPUs by transforming the computation into bit-wise operations and a post-processing step. The former can be highly optimized and achieves more than a trillion operations (cell updates) per second (CUPS)-a first for LCS algorithms. The […]
View View   Download Download (PDF)   
Page 1 of 812345...Last »

* * *

* * *

* * *

Free GPU computing nodes at

Registered users can now run their OpenCL application at We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 11.4
  • SDK: AMD APP SDK 2.8
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 5.0.35, AMD APP SDK 2.8

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to will be treated according to our Privacy Policy

HGPU group © 2010-2014

All rights belong to the respective authors

Contact us: