15424
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, William J. Dally
State-of-the art deep neural networks (DNNs) have hundreds of millions of connections and are both computationally and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources and power budgets. While custom hardware can help the computation, fetching the weights from DRAM can be as much as two orders of magnitude […]
View View   Download Download (PDF)   
Adam Harries, Michel Steuwer, Murray Cole, Alan Gray, Christophe Dubach
While contemporary GPU architectures are heavily biased towards the execution of predictably regular data parallelism, many real application domains are based around data structures which are naturally sparse and irregular. In this paper we demonstrate that high level programming and high performance GPU execution for sparse, irregular problems are not mutually exclusive. Our insight is […]
View View   Download Download (PDF)   
Kyungjoo Kim, Sivasankaran Rajamanickam, George Stelle, H. Carter Edwards, Stephen L. Olivier
We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block layout. The algorithm-by-blocks approach induces a task graph for the factorization. These tasks are inter-related to each other through their data dependences in […]
Joshua Dennis Booth, Sivasankaran Rajamanickam, Heidi K. Thornquist
Scalable sparse LU factorization is critical for efficient numerical simulation of circuits and electrical power grids. In this work, we present a new scalable sparse direct solver called Basker. Basker introduces a new algorithm to parallelize the Gilbert-Peierls algorithm for sparse LU factorization. As architectures evolve, there exists a need for algorithms that are hierarchical […]
View View   Download Download (PDF)   
Joseph L. Greathouse, Mayank Daga
The performance of sparse matrix vector multiplication (SpMV) is important to computational scientists. Compressed sparse row (CSR) is the most frequently used format to store sparse matrices. However, CSR-based SpMV on graphics processing units (GPUs) has poor performance due to irregular memory access patterns, load imbalance, and reduced parallelism. This has led researchers to propose […]
View View   Download Download (PDF)   
Mayank Daga, Joseph L. Greathouse
Sparse matrix vector multiplication (SpMV) is an important linear algebra primitive. Recent research has focused on improving the performance of SpMV on GPUs when using compressed sparse row (CSR), the most frequently used matrix storage format on CPUs. Efficient CSR-based SpMV obviates the need for other GPU-specific storage formats, thereby saving runtime and storage overheads. […]
Linchuan Chen
Because of the bottleneck in the increase of clock frequency, multi-cores emerged as a way of improving the overall performance of CPUs. In the recent decade, many-cores begin to play a more and more important role in scientific computing. The highly cost-effective nature of many-cores makes them extremely suitable for data-intensive computations. Specifically, many-cores are […]
View View   Download Download (PDF)   
Davide Barbieri, Valeria Cardellini, Alessandro Fanfarillo, Salvatore Filippone
The multiplication of a sparse matrix by a dense vector is a centerpiece of scientific computing applications: it is the essential kernel for the solution of sparse linear systems and sparse eigenvalue problems by iterative methods. The efficient implementation of the sparse matrixvector multiplication is therefore crucial and has been the subject of an immense […]
Moritz Kreutzer, Jonas Thies, Melven Rohrig-Zollner, Andreas Pieper, Faisal Shahzad, Martin Galgon, Achim Basermann, Holger Fehske, Georg Hager, Gerhard Wellein
While many of the architectural details of future exascale-class high performance computer systems are still a matter of intense research, there appears to be a general consensus that they will be strongly heterogeneous, featuring "standard" as well as "accelerated" resources. Today, such resources are available as multicore processors, graphics processing units (GPUs), and other accelerators […]
Naveen Anand Subramaniam, Omkar Deshmukh, Vennila Megavannan, Dan Negrut
With the advent of parallel processing architectures and a steep increase in parallelism found among the recent applications, GPGPUs have gained attention with respect to their importance in the execution of these applications. In this document, we specifically analyze Sparse-Matrix Vector Multiplication(SPMV) across different architectures, libraries and matrix formats. The experimental platforms include but are […]
View View   Download Download (PDF)   
N. Sedaghati, A. Ashari, L. N. Pouchet, S. Parthasarathy, P. Sadayappan
Sparse matrix-vector multiplication (SpMV) is a widely used kernel in scientific applications as well as data analytics. Many GPU implementations of SpMV have been proposed, proposing different sparse matrix representations. However, no sparse matrix representation is consistently superior, and the best representation varies for sparse matrices with different sparsity patterns. In this paper we study […]
View View   Download Download (PDF)   
Trevor L. McDonell, Manuel M. T. Chakravarty, Vinod Grover, Ryan R. Newton
Embedded languages are often compiled at application runtime; thus, embedded compile-time errors become application runtime errors. We argue that advanced type system features, such as GADTs and type families, play a crucial role in minimising such runtime errors. Specifically, a rigorous type discipline reduces runtime errors due to bugs in both embedded language applications and […]
Page 1 of 1412345...10...Last »

* * *

* * *

Follow us on Twitter

HGPU group

1745 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

371 people like HGPU on Facebook

HGPU group © 2010-2016 hgpu.org

All rights belong to the respective authors

Contact us: