Posts
Jun, 9
Adaptive Multi-level Blocking Optimization for Sparse Matrix Vector Multiplication on GPU
Sparse matrix vector multiplication (SpMV) is the dominant kernel in scientific simulations. Many-core processors such as GPUs accelerate SpMV computations with high parallelism and memory bandwidth compared to CPUs; however, even for many-core processors the performance of SpMV is still strongly limited by memory bandwidth and lower locality of memory access to input vector causes […]
Jun, 9
Runtime Specialization for Heterogeneous CPU-GPU Platforms
Heterogeneous parallel architectures like those comprised of CPUs and GPUs are a tantalizing compute fabric for performance-hungry developers. While these platforms enable order-of-magnitude performance increases for many data-parallel application domains, there remain several open challenges: (i) the distinct execution models inherent in the heterogeneous devices present on such platforms drives the need to dynamically match […]
Jun, 7
Massively-Parallel Lossless Data Decompression
Today’s exponentially increasing data volumes and the high cost of storage make compression essential for the Big Data industry. Although research has concentrated on efficient compression, fast decompression is critical for analytics queries that repeatedly read compressed data. While decompression can be parallelized somewhat by assigning each data block to a different process, break-through speed-ups […]
Jun, 7
Boda-RTC: Productive Generation of Portable, Efficient Code for Convolutional Neural Networks on Mobile Computing Platforms
The popularity of neural networks (NNs) spans academia, industry, and popular culture. In particular, convolutional neural networks (CNNs) have been applied to many image based machine learning tasks and have yielded strong results. The availability of hardware/software systems for efficient training and deployment of large and/or deep CNN models has been, and continues to be, […]
Jun, 7
Bit-Vectorized GPU Implementation of a Stochastic Cellular Automaton Model for Surface Growth
Stochastic surface growth models aid in studying properties of universality classes like the Kardar–Paris–Zhang class. High precision results obtained from large scale computational studies can be transferred to many physical systems. Many properties, such as roughening and some two-time functions can be studied using stochastic cellular automaton (SCA) variants of stochastic models. Here we present […]
Jun, 7
Co-tuning of Software Specializers and Hardware Accelerators within a CNN Application
Software specializers and hardware accelerators share the common goal of decreasing the runtime of an operation while being parameterizable and abstracting away underlying optimizations from users. The competition for reconfigurable hardware resources among candidate hardware accelerators means that tuning must take place at an application level and not at an operation level as is the […]
Jun, 7
Development of Krylov and AMG linear solvers for large-scale sparse matrices on GPUs
This research introduce our work on developing Krylov subspace and AMG solvers on NVIDIA GPUs. As SpMV is a crucial part for these iterative methods, SpMV algorithms for single GPU and multiple GPUs are implemented. A HEC matrix format and a communication mechanism are established. And also, a set of specific algorithms for solving preconditioned […]
Jun, 3
MODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous Architectures
Code transformations, such as loop tiling and loop fusion, are of key importance for the efficient implementation of stencil computations. However, their direct application to a large code base is costly and severely impacts program maintainability. While recently introduced domain-specific languages facilitate the application of such transformations, they typically still require manual tuning or auto-tuning […]
Jun, 2
Challenges for a GPU-Accelerated Dynamic Programming Approach for Join-Order Optimization
Relational database management systems apply query optimization in order to determine efficient execution plans for declarative queries. Since the execution time of equivalent query execution plans can differ by several orders of magnitude based on the used join order, join-order optimization is one of the most important problems within query processing. Since the time-budget of […]
Jun, 2
Solver for Systems of Linear Equations with Infinite Precision on a GPU Cluster
In this paper, we would like to introduce an accelerated solver for systems of linear equations with an infinite precision designed for GPU clusters. The infinite precision means that the system can provide a precise solution without any rounding error. These errors usually come from limited precision of floating point values within their natural computer […]
Jun, 2
Disc: Approximative Nearest Neighbor Search using Ellipsoids for Photon Mapping on GPUs
Recent development in Graphics Processing Units (GPUs) has enabled inexpensive high-performance computing for general-purpose applications. The K-Nearest Neighbors problem is widely used in applications ranging from classification to gathering of photons in the Photon Mapping algorithm. Using the euclidean distance measure when gathering photons can cause false bleeding of colors between surfaces. Ellipsoidical search boundaries […]
Jun, 2
Processing Posting Lists Using OpenCL
One of the main requirements of internet search engines is the ability to retrieve relevant results with faster response times. Yioop is an open source search engine designed and developed in PHP by Dr. Chris Pollett. The goal of this project is to explore the possibilities of enhancing the performance of Yioop by substituting resource-intensive […]