This document describes an implementation in C of a set of randomized algorithms for computing partial Singular Value Decompositions (SVDs). The techniques largely follow the prescriptions in the article "Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions," N. Halko, P.G. Martinsson, J. Tropp, SIAM Review, 53(2), 2011, pp. 217-288, but with some […]

February 22, 2015 by hgpu

In this thesis we present the first, to our knowledge, implementation and performance analysis of Hermite methods on GPU accelerated systems. We give analytic background for Hermite methods; give implementations of the Hermite methods on traditional CPU systems as well as on GPUs; give the reader background on basic CUDA programming for GPUs; discuss performance […]

February 2, 2015 by hgpu

We discuss modeling and GPU-based computation in a new class of multivariate dynamic models customized to learning and prediction with increasingly high-dimensional time series. This defines an approach to decoupling analysis into a parallel set of univariate time series dynamic models, while flexibly modeling cross-series relationships in a novel, induced class of time-varying graphical models […]

September 30, 2014 by hgpu

Kurepa’s conjecture states that there is no odd prime p which divides !p=0!+1!+…+(p-1)!. We search for a counterexample of this conjecture for all p<10^10. We introduce new optimization techniques and perform the computation using graphics processing units (GPUs). Additionally, we consider the generalized Kurepa’s left factorial given as !kn=(0!)k+(1!)k+…+((n-1)!)k and show that for all integers […]

September 3, 2014 by hgpu

We present a block structured orthogonal factorization (BSOF) algorithm and its parallelization for computing the inversion of block p-cyclic matrices.We aim at the high performance on multicores with GPU accelerators. We provide a quantitative performance model for optimal host-device load balance, and validate the model through numerical tests. Benchmarking results show that the parallel BSOF […]

August 23, 2014 by hgpu

A parallel implementation of a method of the semi-Lagrangian type for the advection equation on a hybrid architecture com-putation system is discussed. The difference scheme with variable stencil is constructed on the base of an integral equality between the neighboring time levels. The proposed approach allows one to avoid the Courant-Friedrichs-Lewy restriction on the relation […]

August 21, 2014 by hgpu

This book brings together research on numerical methods adapted for Graphics Processing Units (GPUs). It explains recent efforts to adapt classic numerical methods, including solution of linear equations and FFT, for massively parallel GPU architectures. This volume consolidates recent research and adaptations, covering widely used methods that are at the core of many scientific and […]

August 13, 2014 by hgpu

CUMODP is a CUDA library for exact computations with dense polynomials over finite fields. A variety of operations like multiplication, division, computation of subresultants, multi-point evaluation, interpolation and many others are provided. These routines are primarily designed to offer GPU support to polynomial system solvers and a bivariate system solver is part of the library. […]

August 7, 2014 by hgpu

The solution of large-scale Lyapunov equations is an important tool for the solution of several engineering problems arising in optimal control and model order reduction. In this work we investigate the case when the coefficient matrix of the equations presents a band structure. Exploiting the structure of this matrix we can achive relevant reductions in […]

August 3, 2014 by hgpu

The sequential Monte Carlo (smc) methods have been widely used for modern scientific computation. Bayesian model comparison has been successfully applied in many fields. Yet there have been few researches on the use of smc for the purpose of Bayesian model comparison. This thesis studies different smc strategies for Bayesian model computation. In addition, various […]

July 28, 2014 by hgpu

Two new algorithms for numerical solution of static Hamilton-Jacobi equations are presented. These algorithms are designed to work efficiently on different parallel computing architectures, and numerical results for multicore CPU and GPU implementations are reported and discussed. The numerical experiments show that the proposed solution strategies scale well with the computational power of the hardware. […]

July 26, 2014 by hgpu

CONTEXT: Reinforcement Learning (RL) is a time consuming effort that requires a lot of computational power as well. There are mainly two approaches to improving RL efficiency, the theoretical mathematics and algorithmic approach or the practical implementation approach. In this study, the approaches are combined in an attempt to reduce time consumption. OBJECTIVES: We investigate […]

July 9, 2014 by hgpu