Posts
Feb, 24
Accelerating Lagrangian Particle Dispersion in the Atmosphere with OpenCL across Multiple Platforms
FLEXPART is a popular simulator that models the transport and diffusion of air pollutants, based on the Lagrangian approach. It is capable of regional and global simulation and supports both forward and backward runs. A complex model like this contains many calculations suitable for parallelisation. Recently, a GPU-accelerated version of the simulator (FLEXCPP) has been […]
Feb, 23
High Performance Computing of Meshless Time Domain Method on Multi-GPU Cluster
High performance computing of Meshless Time Domain Method (MTDM) on multi-GPU using the supercomputer HA-PACS (Highly Accelerated Parallel Advanced system for Computational Sciences) at University of Tsukuba is investigated. Generally, the finite difference time domain (FDTD) method is adopted for the numerical simulation of the electromagnetic wave propagation phenomena. However, the numerical domain must be […]
Feb, 23
Document Image Binarization Using Image Segmentation Algorithm in Parallel Environment
The Segmentation of text from poorly degraded document images is a very hard due to the high intravariation between the document background and the foreground text of different document images. The algorithms used for Image processing take more time for execution on a single core processor. Graphics Processing Unit (GPU) is becoming most popular due […]
Feb, 23
Characterising Bipartite Graph Matching Algorithms on GPUs
Two well-known bipartite graph matching algorithms, the Gale-Shapley algorithm and the Hungarian (Kuhn-Munkres) algorithm, has been ported to run on General-Purpose Graphics Processing Units (GPGPU) using kernels written with the CUDA programming model. This was done with the goal of characterising and assessing the performance and behaviour of these matching algorithms on the GPU, and […]
Feb, 23
Investigation of the OpenCL SYCL Programming Model
OpenCL SYCL is a new heterogeneous and parallel programming framework created by the Khronos Group that tries to bring OpenCL programming into C++. In particular, it enables C++ developers to create OpenCL kernels, using all the popular C++ features, such as classes, inheritance and templates. What is more, it dramatically reduces programming effort and complexity, […]
Feb, 23
Asynchronous OpenCL/MPI numerical simulations of conservation laws
Hyperbolic conservation laws are important mathematical models for describing many phenomena in physics or engineering. The Finite Volume (FV) method and the Discontinuous Galerkin (DG) methods are two popular methods for solving conservation laws on computers. Those two methods are good candidates for parallel computing: a) they require a large amount of uniform and simple […]
Feb, 22
Stochastic Gradient Descent on GPUs
Irregular algorithms such as Stochastic Gradient Descent (SGD) can benefit from the massive parallelism available on GPUs. However, unlike in data-parallel algorithms, synchronization patterns in SGD are quite complex. Furthermore, scheduling for scale-free graphs is challenging. This work examines several synchronization strategies for SGD, ranging from simple locking to conflict-free scheduling. We observe that static […]
Feb, 22
High performance methods for frequent pattern mining
Current Big Data era is generating tremendous amount of data in most fields such as business, social media, engineering, and medicine. The demand to process and handle the resulting "big data" has led to the need for fast data mining methods to develop powerful and versatile analysis tools that can turn data into useful knowledge. […]
Feb, 22
Comparison of SPMV performance on matrices with different matrix format using CUSP, cuSPARSE and ViennaCL
ViennaCL is a free open-source linear algebra library for computations on many-core architectures (GPUs, MIC) and multi-core CPUs. The library is written in C++ and supports CUDA, OpenCL, and OpenMP. In addition to core functionality and many other features including BLAS level 1-3 support and iterative solvers, the latest release family ViennaCL 1.6.x provides fast […]
Feb, 22
QPACE 2 and Domain Decomposition on the Intel Xeon Phi
We give an overview of QPACE 2, which is a custom-designed supercomputer based on Intel Xeon Phi processors, developed in a collaboration of Regensburg University and Eurotech. We give some general recommendations for how to write high-performance code for the Xeon Phi and then discuss our implementation of a domain-decomposition-based solver and present a number […]
Feb, 22
RSVDPACK: Subroutines for computing partial singular value decompositions via randomized sampling on single core, multi core, and GPU architectures
This document describes an implementation in C of a set of randomized algorithms for computing partial Singular Value Decompositions (SVDs). The techniques largely follow the prescriptions in the article "Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions," N. Halko, P.G. Martinsson, J. Tropp, SIAM Review, 53(2), 2011, pp. 217-288, but with some […]
Feb, 22
Exploring Design Space of 3D NVM and eDRAM Caches Using DESTINY Tool (open-source code)
To enable the design of large sized caches, novel memory technologies (such as non-volatile memory) and novel fabrication approaches (e.g. 3D stacking) have been explored. The existing modeling tools, however, cover only few memory technologies, CMOS technology nodes and fabrication approaches. We present DESTINY, a tool for modeling 3D (and 2D) cache designs using SRAM, […]