Posts
Aug, 28
Massively parallel simulations of relativistic fluid dynamics on graphics processing units with CUDA
Relativistic fluid dynamics is a major component in dynamical simulations of the quark-gluon plasma created in relativistic heavy-ion collisions. Simulations of the full three-dimensional dissipative dynamics of the quark-gluon plasma with fluctuating initial conditions are computationally expensive and typically require some degree of parallelization. In this paper, we present a GPU implementation of the Kurganov-Tadmor […]
Aug, 23
Fast Multidimensional Image Processing with OpenCL
Multidimensional image data, i.e., images with three or more dimensions, are used in many areas of science. Multidimensional image processing is supported in Python and MATLAB. VisionGL is an open source library that provides a set of image processing functions and can help the programmer by automatically generating code. The objective of this work is […]
Aug, 23
Accelerating Exact and Approximate Inference for (Distributed) Discrete Optimization with GPUs
Discrete optimization is a central problem in artificial intelligence. The optimization of the aggregated cost of a network of cost functions arises in a variety of problems including (W)CSP, DCOP, as well as optimization in stochastic variants such as Bayesian networks. Inference-based algorithms are powerful techniques for solving discrete optimization problems, which can be used […]
Aug, 23
MetaMorph: A Library Framework for Interoperable Kernels on Multi- and Many-core Clusters
To attain scalable performance efficiently, the HPC community expects future exascale systems to consist of multiple nodes, each with different types of hardware accelerators. In addition to GPUs and Intel MICs, additional candidate accelerators include embedded multiprocessors and FPGAs. End users need appropriate tools to efficiently use the available compute resources in such systems, both […]
Aug, 23
MAGMA Batched: A Batched BLAS Approach for Small Matrix Factorizations and Applications on GPUs
A particularly challenging class of problems arising in many applications, called batched problems, involves linear algebra operations on many small-sized matrices. We proposed and designed batched BLAS (Basic Linear Algebra Subroutines), Level-2 GEMV and Level-3 GEMM, to solve them. We illustrate how to optimize batched GEMV and GEMM to assist batched advance factorization (e.g. bi-diagonalization) […]
Aug, 23
Hybrid CPU-GPU Framework for Network Motifs
Massively parallel architectures such as the GPU are becoming increasingly important due to the recent proliferation of data. In this paper, we propose a key class of hybrid parallel graphlet algorithms that leverages multiple CPUs and GPUs simultaneously for computing k-vertex induced subgraph statistics (called graphlets). In addition to the hybrid multi-core CPU-GPU framework, we […]
Aug, 18
Streaming Applications on Heterogeneous Platforms
Using multiple streams can improve the overall system performance by mitigating the data transfer overhead on heterogeneous systems. Currently, very few cases have been streamed to demonstrate the streaming performance impact and a systematic investigation of streaming necessity and how-to over a large number of test cases remains a gap. In this paper, we use […]
Aug, 18
GPU-Acceleration of In-Memory Data Analytics
Hardware advances strongly influence the database system design. The flattening speed of CPU cores makes many-core accelerators, such as GPUs, a vital alternative to explore for processing the ever-increasing amounts of data. GPUs have a significantly higher degree of parallelism than multi-core CPUs but their cores are simpler. As a result, they do not face […]
Aug, 18
GPU-accelerated Gibbs Sampling
Gibbs sampling is a widely used Markov Chain Monte Carlo (MCMC) method for numerically approximating integrals of interest in Bayesian statistics and other mathematical sciences. Many implementations of MCMC methods do not extend easily to parallel computing environments, as their inherently sequential nature incurs a large synchronization cost. In this paper, we show how to […]
Aug, 18
SkePU 2: Language Embedding and Compiler Support for Flexible and Type-Safe Skeleton Programming
This thesis presents SkePU 2, the next generation of the SkePU C++ framework for programming of heterogeneous parallel systems using the skeleton programming concept. SkePU 2 is presented after a thorough study of the state of parallel programming models, frameworks and tools, including other skeleton programming systems. The advancements in SkePU 2 include a modern […]
Aug, 18
Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising
Discriminative model learning for image denoising has been recently attracting considerable attentions due to its favorable denoising performance. In this paper, we take one step forward by investigating the construction of feed-forward denoising convolutional neural networks (DnCNNs) to embrace the progress in very deep architecture, learning algorithm, and regularization method into image denoising. Specifically, residual […]
Aug, 16
Automatic Generation of OpenCL Code for ARM Architectures
The efficient exploitation of the increasing computational capabilities of mobile devices is still a challenge. The heterogeneity of Systems on Chip (SoC) makes necessary a very specific knowledge of their hardware in order to harness their full potential. OpenCL is a well known standard for cross-platform usage of accelerator devices. We follow an annotation-based approach […]