15337
Kyungjoo Kim, Sivasankaran Rajamanickam, George Stelle, H. Carter Edwards, Stephen L. Olivier
We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block layout. The algorithm-by-blocks approach induces a task graph for the factorization. These tasks are inter-related to each other through their data dependences in […]
Joshua Dennis Booth, Sivasankaran Rajamanickam, Heidi K. Thornquist
Scalable sparse LU factorization is critical for efficient numerical simulation of circuits and electrical power grids. In this work, we present a new scalable sparse direct solver called Basker. Basker introduces a new algorithm to parallelize the Gilbert-Peierls algorithm for sparse LU factorization. As architectures evolve, there exists a need for algorithms that are hierarchical […]
View View   Download Download (PDF)   
Tingxing Dong
The emergence of multicore and heterogeneous architectures requires many linear algebra algorithms to be redesigned to take advantage of the accelerators, such as GPUs. A particularly challenging class of problems, arising in numerous applications, involves the use of linear algebra operations on many small-sized matrices. The size of these matrices is usually the same, up […]
View View   Download Download (PDF)   
Wei Tan, Liangliang Cao
Matrix factorization (MF) is widely used in recommendation systems. We present cuMF, a highly-optimized matrix factorization tool with supreme performance on graphics processing units (GPUs) by fully utilizing the GPU compute power and minimizing the overhead of data movement. Firstly, we introduce a memoryoptimized alternating least square (ALS) method by reducing discontiguous memory access and […]
View View   Download Download (PDF)   
Emanuele De Falco
Stochastic Gradient Descent, a stochastic optimization of Gradient Descent, is an algorithm that is used in different topics, like for example for linear regression or logistic regression. After the Netflix prize, SGD start to be used also in recommender systems to compute matrix factorization. Considering the large amounts of data that this kind of system […]
View View   Download Download (PDF)   
Ioannis E. Venetis, Alexandros Sobczyk, Alexandros Kouris, Alexandros Nakos, Nikolaos Nikoloutsakos, Efstratios Gallopoulos
Manycores like the Intel Xeon Phi and graphics processing units like the NVIDIA Tesla series are prime examples of systems for accelerating applications that run on current CPU multicores. It is therefore of interest to build fast, reliable linear system solvers targeting these architectures. Moreover, it is of interest to conduct cross comparisons between algorithmic […]
View View   Download Download (PDF)   
Azzam Haidar, Stanimire Tomov, Piotr Luszczek, Jack Dongarra
Embedded computing, not only in large systems like drones and hybrid vehicles, but also in small portable devices like smart phones and watches, gets more extreme to meet ever increasing demands for extended and improved functionalities. This, combined with the typical constrains for low power consumption and small sizes, makes the design of numerical libraries […]
Terry Cojean, Abdou Guermouche, Andra Hugo, Raymond Namyst, Pierre-Andre Wacrenier
Computing platforms are now extremely complex providing an increasing number of CPUs and accelerators. This trend makes balancing computations between these heterogeneous resources performance critical. In this paper we tackle the task granularity problem and we propose aggregating several CPUs in order to execute larger parallel tasks and thus find a better equilibrium between the […]
View View   Download Download (PDF)   
Azzam Haidar, Chongxiao Cao, Stanimire Tomov, Asim YarKhan, Piotr Luszczek, Jack Dongarra
Modern high performance computing environments are composed of networks of compute nodes that often contain a variety of heterogeneous compute resources, such as multicore-CPUs, GPUs, and coprocessors. One challenge faced by domain scientists is how to efficiently use all these distributed, heterogeneous resources. In order to use the GPUs effectively, the workload parallelism needs to […]
View View   Download Download (PDF)   
Marc Baboulin, Jack Dongarra, Adrien Remy, Stanimire Tomov, Ichitaro Yamazaki
We study the performance of dense symmetric indefinite factorizations (Bunch-Kaufman and Aasen’s algorithms) on multicore CPUs with a Graphics Processing Unit (GPU). Though such algorithms are needed in many scientific and engineering simulations, obtaining high performance of the factorization on the GPU is difficult because the pivoting, required to ensure the numerical stability of the […]
View View   Download Download (PDF)   
Toru Fujita, Koji Nakano, Yasuaki Ito
RSA is one the most well-known public-key cryptosystems widely used for secure data transfer. An RSA encryption key includes a modulus n which is the product of two large prime numbers p and q. If an RSA modulus n can be decomposed into p and q, the corresponding decryption key can be computed easily from […]
View View   Download Download (PDF)   
Sencer Nuri Yeralan, Timothy A. Davis, Sanjay Ranka
Sparse matrix factorization involves a mix of regular and irregular computation, which is a particular challenge when trying to obtain high-performance on the highly parallel general-purpose computing cores available on graphics processing units (GPUs). We present a sparse multifrontal QR factorization method that meets this challenge, and is up to eleven times faster than a […]
Page 1 of 912345...Last »

* * *

* * *

Follow us on Twitter

HGPU group

1745 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

371 people like HGPU on Facebook

HGPU group © 2010-2016 hgpu.org

All rights belong to the respective authors

Contact us: