Recently, graphics processors (GPUs) have been increasingly leveraged in a variety of scientific computing applications. However, architectural differences between CPUs and GPUs necessitate the development of algorithms that take advantage of GPU hardware. As sparse matrix vector multiplication (SPMV) operations are commonly used in finite element analysis, a new SPMV algorithm and several variations are […]

We present an interpretation of subdivision surface evaluation in the language of linear algebra. Specifically, the vector of surface points can be computed by left-multiplying the vector of control points by a sparse subdivision matrix. This "matrix-driven" interpretation applies to any level of subdivision, holds for many common subdivision schemes (including Catmull-Clark and Loop), supports […]

This paper presents a sparse matrix partitioning strategy to improve the performance of SpMV on GPUs and multicore CPUs. This method has wide adaptability for different types of sparse matrices, and is different from existing methods which only adapt to some particular sparse matrices. In addition, our partitioning method can obtain dense blocks by analyzing […]

A multi-dimensional data model provides a good conceptual view of the data in data warehousing and On-Line Analytical Processing (OLAP). A typical representation of such a data model is as a multi-dimensional array which is well suited when the array is dense. If the array is sparse, i.e., has a few number of non-zero elements […]

The runtime of a Lattice QCD simulation is dominated by a small kernel, which calculates the product of a vector by a sparse matrix known as the "Dslash" operator. Therefore, this kernel is frequently optimized for various HPC architectures. In this contribution we compare the performance of the Intel Xeon Phi to current Kepler-based NVIDIA […]

In the finite element method simulation we often deal with large sparse matrices. Sparse matrix-vector multiplication (SpMV) is of high importance for iterative solvers. During the solver stage, most of the time is in fact spent in the SpMV routine. The SpMV routine is highly memory-bound; the processor spends much time waiting for the needed […]

Krylov subspace solvers are often the method of choice when solving sparse linear systems iteratively. At the same time, hardware accelerators such as graphics processing units (GPUs) continue to offer significant floating point performance gains for matrix and vector computations through easy-to-use libraries of computational kernels. However, as these libraries are usually composed of a […]

In earlier times, computer systems had only a single core or processor. In these computers, the number of transistors on-chip (i.e. on the processor) doubled every two years and all applications enjoyed free speedup. Subsequently, with more and more transistors being packed on-chip, power consumption became an issue, frequency scaling reached its limits and industry […]

In this paper, we develop, study and implement a restricted additive Schwarz (RAS) preconditioner for speedup of the solution of sparse linear systems on NVIDIA Tesla GPU. A novel algorithm for constructing this preconditioner is proposed. This algorithm involves two phases. In the first phase, the construction of the RAS preconditioner is transformed to an […]

Linear systems are required to solve in many scientific applications and the solution of these systems often dominates the total running time. In this paper, we introduce our work on developing parallel linear solvers and preconditioners for solving large sparse linear systems using NVIDIA GPUs. We develop a new sparse matrix-vector multiplication kernel and a […]

The examination timetabling problem belongs to the class of combinatorial optimization problems and is of great importance for every University. In this paper, a hybrid evolutionary algorithm running on a GPU is employed to solve the examination timetabling problem. The hybrid evolutionary algorithm proposed has a genetic algorithm component and a greedy steepest descent component. […]

In this paper we present new hybrid CPU-GPU routines to accelerate the solution of linear systems, with band coefficient matrix, by off-loading the major part of the computations to the GPU and leveraging highly tuned implementations of the BLAS for the graphics processor. Our experiments with an nVidia S2070 GPU report speed-ups up to 6x […]

