Posts
Aug, 19
Parallel Graph Mining with GPUs
Frequent graph mining is an important though computationally hard problem because it requires enumerating possibly an exponential number of candidate subgraph patterns, and checking their presence in a database of graphs. In this paper, we propose a novel approach for parallel graph mining on GPUs, which have emerged as a relatively cheap but powerful architecture […]
Aug, 19
Parallel Outlier Detection on Uncertain Data for GPUs
Outlier detection, also known as anomaly detection, is a common data mining task in identifying data points that are outside expected patterns in a given dataset. It has useful applications such as network intrusion, system faults, and fraudulent activity. In addition, real world data are uncertain in nature and they may be represented as uncertain […]
Aug, 19
Practical Symbolic Race Checking of GPU Programs
Even the careful GPU programmer can inadvertently introduce data races while writing and optimizing code. Currently available GPU race checking methods fall short either in terms of their formal guarantees, ease of use, or practicality. Existing symbolic methods: (1) do not fully support existing CUDA kernels; (2) may require user-specified assertions or invariants; (3) often […]
Aug, 19
An Efficient Cell List Implementation for Monte Carlo Simulation on GPUs
Maximizing the performance potential of the modern day GPU architecture requires judicious utilization of available parallel resources. Although dramatic reductions can often be obtained through straightforward mappings, further performance improvements often require algorithmic redesigns to more closely exploit the target architecture. In this paper, we focus on efficient molecular simulations for the GPU and propose […]
Aug, 19
Accelerated composite distribution function methods for computational fluid dynamics using GPU
The Kinetic Theory of Gases has long been established as a useful tool for the solution of modern Computational Fluid Dynamics (CFD) problems. Together with the Finite Volume Method, such approaches have been popular in CFD for over 30 years, with techniques such as the Equilibrium Flux Method (EFM) or Kinetic Flux Vector Splitting (KFVS), […]
Aug, 18
An OpenCL implementation of a forward sampling algorithm for CP-logic
We present an approximate query answering algorithm for the Probabilistic Logic Programming language CP-logic. It complements existing sampling algorithms by using the rules from body to head instead of in the other direction. We present an implementation in OpenCL, which is able to exploit the multicore architecture of modern GPUs to compute a large number […]
Aug, 18
Ocelot/HyPE: Optimized Data Processing on Heterogeneous Hardware
The past years saw the emergence of highly heterogeneous server architectures that feature multiple accelerators in addition to the main processor. Efficiently exploiting these systems for data processing is a challenging research problem that comprises many facets, including how to find an optimal operator placement strategy, how to estimate runtime costs across different hardware architectures, […]
Aug, 18
Quantum Boolean Image Denoising
A quantum Boolean image processing methodology is presented in this work, with special emphasis in image denoising. A new approach for internal image representation is outlined together with two new interfaces: classical-to-quantum and quantum-to-classical. The new quantum-Boolean image denoising called quantum Boolean mean filter (QBMF) works with computational basis states (CBS), exclusively. To achieve this, […]
Aug, 18
High Level High Performance Computing for Multitask Learning of Time-varying Models
We propose an approach suitable to learn multiple time-varying models jointly and discuss an application in data-driven weather forecasting. The methodology relies on spectral regularization and encodes the typical multi-task learning assumption that models lie near a common low dimensional subspace. The arising optimization problem amounts to estimating a matrix from noisy linear measurements within […]
Aug, 15
Improving Communication Performance and Scalability of Native Applications on Intel Xeon Phi Coprocessor Clusters
Intel Xeon Phi coprocessor-based clusters offer high compute and memory performance for parallel workloads and also support direct network access. Many real world applications are significantly impacted by network characteristics and to maximize the performance of such applications on these clusters, it is particularly important to effectively saturate network bandwidth and/or hide communications latency. We […]
Aug, 15
Design and Evaluation of Scalable Concurrent Queues for Many-Core Architectures
As core counts increase and as heterogeneity becomes more common in parallel computing, we face the prospect of programming hundreds or even thousands of concurrent threads in a single shared-memory system. At these scales, even highly-efficient concurrent algorithms and data structures can become bottlenecks, unless they are designed from the ground up with throughput as […]
Aug, 15
Energy Evaluation for Applications with Different Thread Affinities on the Intel Xeon Phi
The Intel Xeon Phi coprocessor offers high parallelism on energy-efficient hardware to minimize energy consumption while maintaining performance. Dynamic frequency and voltage scaling is not accessible on the Intel Xeon Phi. Hence, saving energy relies mainly on tuning application performance. One general optimization technique is thread affinity, which is an important factor in multi-core architectures. […]

