Posts
Jun, 7
Bit-Vectorized GPU Implementation of a Stochastic Cellular Automaton Model for Surface Growth
Stochastic surface growth models aid in studying properties of universality classes like the Kardar–Paris–Zhang class. High precision results obtained from large scale computational studies can be transferred to many physical systems. Many properties, such as roughening and some two-time functions can be studied using stochastic cellular automaton (SCA) variants of stochastic models. Here we present […]
Jun, 7
Co-tuning of Software Specializers and Hardware Accelerators within a CNN Application
Software specializers and hardware accelerators share the common goal of decreasing the runtime of an operation while being parameterizable and abstracting away underlying optimizations from users. The competition for reconfigurable hardware resources among candidate hardware accelerators means that tuning must take place at an application level and not at an operation level as is the […]
Jun, 7
Development of Krylov and AMG linear solvers for large-scale sparse matrices on GPUs
This research introduce our work on developing Krylov subspace and AMG solvers on NVIDIA GPUs. As SpMV is a crucial part for these iterative methods, SpMV algorithms for single GPU and multiple GPUs are implemented. A HEC matrix format and a communication mechanism are established. And also, a set of specific algorithms for solving preconditioned […]
Jun, 3
MODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous Architectures
Code transformations, such as loop tiling and loop fusion, are of key importance for the efficient implementation of stencil computations. However, their direct application to a large code base is costly and severely impacts program maintainability. While recently introduced domain-specific languages facilitate the application of such transformations, they typically still require manual tuning or auto-tuning […]
Jun, 2
Challenges for a GPU-Accelerated Dynamic Programming Approach for Join-Order Optimization
Relational database management systems apply query optimization in order to determine efficient execution plans for declarative queries. Since the execution time of equivalent query execution plans can differ by several orders of magnitude based on the used join order, join-order optimization is one of the most important problems within query processing. Since the time-budget of […]
Jun, 2
Solver for Systems of Linear Equations with Infinite Precision on a GPU Cluster
In this paper, we would like to introduce an accelerated solver for systems of linear equations with an infinite precision designed for GPU clusters. The infinite precision means that the system can provide a precise solution without any rounding error. These errors usually come from limited precision of floating point values within their natural computer […]
Jun, 2
Disc: Approximative Nearest Neighbor Search using Ellipsoids for Photon Mapping on GPUs
Recent development in Graphics Processing Units (GPUs) has enabled inexpensive high-performance computing for general-purpose applications. The K-Nearest Neighbors problem is widely used in applications ranging from classification to gathering of photons in the Photon Mapping algorithm. Using the euclidean distance measure when gathering photons can cause false bleeding of colors between surfaces. Ellipsoidical search boundaries […]
Jun, 2
Processing Posting Lists Using OpenCL
One of the main requirements of internet search engines is the ability to retrieve relevant results with faster response times. Yioop is an open source search engine designed and developed in PHP by Dr. Chris Pollett. The goal of this project is to explore the possibilities of enhancing the performance of Yioop by substituting resource-intensive […]
Jun, 2
Weighted Residuals for Very Deep Networks
Deep residual networks have recently shown appealing performance on many challenging computer vision tasks. However, the original residual structure still has some defects making it difficult to converge on very deep networks. In this paper, we introduce a weighted residual network to address the incompatibility between ReLU and element-wise addition and the deep network initialization […]
May, 31
Computer Vision on the GPU — Tools, Algorithms and Frameworks
In recent years, graphic processing units (GPUs) have emerged as an attractive alternative to CPUs for implementing algorithms in a wide range of applications. The focus of this work is to give an overview about the current state on using GPUs for computer vision. We describe briefly tools like CUDA, OpenCL and OpenACC used for […]
May, 30
clSPARSE: A Vendor-Optimized Open-Source Sparse BLAS Library
Sparse linear algebra is a cornerstone of modern computational science. These algorithms ignore the zero-valued entries found in many domains in order to work on much larger problems at much faster rates than dense algorithms. Nonetheless, optimizing these algorithms is not straightforward. Highly optimized algorithms for multiplying a sparse matrix by a dense vector, for […]
May, 30
TensorFlow: A system for large-scale machine learning
TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore […]