12704

Posts

Aug, 18

An OpenCL implementation of a forward sampling algorithm for CP-logic

We present an approximate query answering algorithm for the Probabilistic Logic Programming language CP-logic. It complements existing sampling algorithms by using the rules from body to head instead of in the other direction. We present an implementation in OpenCL, which is able to exploit the multicore architecture of modern GPUs to compute a large number […]
Aug, 18

Ocelot/HyPE: Optimized Data Processing on Heterogeneous Hardware

The past years saw the emergence of highly heterogeneous server architectures that feature multiple accelerators in addition to the main processor. Efficiently exploiting these systems for data processing is a challenging research problem that comprises many facets, including how to find an optimal operator placement strategy, how to estimate runtime costs across different hardware architectures, […]
Aug, 18

Quantum Boolean Image Denoising

A quantum Boolean image processing methodology is presented in this work, with special emphasis in image denoising. A new approach for internal image representation is outlined together with two new interfaces: classical-to-quantum and quantum-to-classical. The new quantum-Boolean image denoising called quantum Boolean mean filter (QBMF) works with computational basis states (CBS), exclusively. To achieve this, […]
Aug, 18

High Level High Performance Computing for Multitask Learning of Time-varying Models

We propose an approach suitable to learn multiple time-varying models jointly and discuss an application in data-driven weather forecasting. The methodology relies on spectral regularization and encodes the typical multi-task learning assumption that models lie near a common low dimensional subspace. The arising optimization problem amounts to estimating a matrix from noisy linear measurements within […]
Aug, 15

Improving Communication Performance and Scalability of Native Applications on Intel Xeon Phi Coprocessor Clusters

Intel Xeon Phi coprocessor-based clusters offer high compute and memory performance for parallel workloads and also support direct network access. Many real world applications are significantly impacted by network characteristics and to maximize the performance of such applications on these clusters, it is particularly important to effectively saturate network bandwidth and/or hide communications latency. We […]
Aug, 15

Design and Evaluation of Scalable Concurrent Queues for Many-Core Architectures

As core counts increase and as heterogeneity becomes more common in parallel computing, we face the prospect of programming hundreds or even thousands of concurrent threads in a single shared-memory system. At these scales, even highly-efficient concurrent algorithms and data structures can become bottlenecks, unless they are designed from the ground up with throughput as […]
Aug, 15

Energy Evaluation for Applications with Different Thread Affinities on the Intel Xeon Phi

The Intel Xeon Phi coprocessor offers high parallelism on energy-efficient hardware to minimize energy consumption while maintaining performance. Dynamic frequency and voltage scaling is not accessible on the Intel Xeon Phi. Hence, saving energy relies mainly on tuning application performance. One general optimization technique is thread affinity, which is an important factor in multi-core architectures. […]
Aug, 15

GPU Accelerated Computation and Real-time Rendering of Cellular Automata Model for Spatial Simulation

Because Cellular Automata (CA) is a dynamic system with inherent parallelism, many studies are focused on mapping CA to the parallel system in order to obtain high performance computing capability, such as using clusters, supercomputers and networks of computers. But the application of these systems are too expensive and difficult to use on the occasions […]
Aug, 15

Effect of GPU Communication-Hiding for SpMV Using OpenACC

In the finite element method simulation we often deal with large sparse matrices. Sparse matrix-vector multiplication (SpMV) is of high importance for iterative solvers. During the solver stage, most of the time is in fact spent in the SpMV routine. The SpMV routine is highly memory-bound; the processor spends much time waiting for the needed […]
Aug, 13

Numerical Computations with GPUs

This book brings together research on numerical methods adapted for Graphics Processing Units (GPUs). It explains recent efforts to adapt classic numerical methods, including solution of linear equations and FFT, for massively parallel GPU architectures. This volume consolidates recent research and adaptations, covering widely used methods that are at the core of many scientific and […]
Aug, 13

Graphics Processing Unit Bloom Filters: Classical and Probabilistic

Graphics Processing Units (GPUs) have been used to enhance the speed and efficiency of both data structures and algorithms alike. A common data structure used in Computer Science is the Bloom Filter, which is used in many types of applications including databases and security logging. The Bloom Filter is a lossy data structure that uses […]
Aug, 13

Non-Local Total Generalized Variation for Optical Flow Estimation

In this paper we introduce a novel higher-order regularization term. The proposed regularizer is a non-local extension of the popular second-order Total Generalized variation, which favors piecewise affine solutions and allows to incorporate soft-segmentation cues into the regularization term. These properties make this regularizer especially appealing for optical flow estimation, where it offers accurately localized […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: