15952

Posts

Jun, 3

MODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous Architectures

Code transformations, such as loop tiling and loop fusion, are of key importance for the efficient implementation of stencil computations. However, their direct application to a large code base is costly and severely impacts program maintainability. While recently introduced domain-specific languages facilitate the application of such transformations, they typically still require manual tuning or auto-tuning […]
Jun, 2

Challenges for a GPU-Accelerated Dynamic Programming Approach for Join-Order Optimization

Relational database management systems apply query optimization in order to determine efficient execution plans for declarative queries. Since the execution time of equivalent query execution plans can differ by several orders of magnitude based on the used join order, join-order optimization is one of the most important problems within query processing. Since the time-budget of […]
Jun, 2

Solver for Systems of Linear Equations with Infinite Precision on a GPU Cluster

In this paper, we would like to introduce an accelerated solver for systems of linear equations with an infinite precision designed for GPU clusters. The infinite precision means that the system can provide a precise solution without any rounding error. These errors usually come from limited precision of floating point values within their natural computer […]
Jun, 2

Disc: Approximative Nearest Neighbor Search using Ellipsoids for Photon Mapping on GPUs

Recent development in Graphics Processing Units (GPUs) has enabled inexpensive high-performance computing for general-purpose applications. The K-Nearest Neighbors problem is widely used in applications ranging from classification to gathering of photons in the Photon Mapping algorithm. Using the euclidean distance measure when gathering photons can cause false bleeding of colors between surfaces. Ellipsoidical search boundaries […]
Jun, 2

Processing Posting Lists Using OpenCL

One of the main requirements of internet search engines is the ability to retrieve relevant results with faster response times. Yioop is an open source search engine designed and developed in PHP by Dr. Chris Pollett. The goal of this project is to explore the possibilities of enhancing the performance of Yioop by substituting resource-intensive […]
Jun, 2

Weighted Residuals for Very Deep Networks

Deep residual networks have recently shown appealing performance on many challenging computer vision tasks. However, the original residual structure still has some defects making it difficult to converge on very deep networks. In this paper, we introduce a weighted residual network to address the incompatibility between ReLU and element-wise addition and the deep network initialization […]
May, 31

Computer Vision on the GPU — Tools, Algorithms and Frameworks

In recent years, graphic processing units (GPUs) have emerged as an attractive alternative to CPUs for implementing algorithms in a wide range of applications. The focus of this work is to give an overview about the current state on using GPUs for computer vision. We describe briefly tools like CUDA, OpenCL and OpenACC used for […]
May, 30

clSPARSE: A Vendor-Optimized Open-Source Sparse BLAS Library

Sparse linear algebra is a cornerstone of modern computational science. These algorithms ignore the zero-valued entries found in many domains in order to work on much larger problems at much faster rates than dense algorithms. Nonetheless, optimizing these algorithms is not straightforward. Highly optimized algorithms for multiplying a sparse matrix by a dense vector, for […]
May, 30

TensorFlow: A system for large-scale machine learning

TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore […]
May, 30

Bridging the Performance-Programmability Gap for FPGAs via OpenCL: A Case Study with OpenDwarfs

For decades, the streaming architecture of FPGAs has delivered accelerated performance across many application domains, such as option pricing solvers in finance, computational fluid dynamics in oil and gas, and packet processing in network routers and firewalls. However, this performance has come at the significant expense of programmability, i.e., the performance-programmability gap. In particular, FPGA […]
May, 30

A GPU Accelerated Continuous and Discontinuous Galerkin Non-hydrostatic Atmospheric Model

We present a GPU accelerated nodal discontinuous Galerkin method for the solution of the three dimensional Euler equations, which are nonlinear hyperbolic equations that govern the motion and thermodynamic state of the atmosphere. The part of the solution process that solves the governing equations of motion with no moist processes is called the dynamical core. […]
May, 30

Deep API Learning

Developers often wonder how to implement a certain functionality (e.g., how to parse XML files) using APIs. Obtaining an API usage sequence based on an API-related natural language query is very helpful in this regard. Given a query, existing approaches utilize information retrieval models to search for matching API sequences. These approaches treat queries and […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: