Posts
Oct, 31
Investigation of General-Purpose Computing on Graphics Processing Units and its Application to the Finite Element Analysis of Electromagnetic Problems
In this dissertation, the hardware and API architectures of GPUs are investigated, and the corresponding acceleration techniques are applied on the traditional frequency domain finite element method (FEM), the element-level time-domain methods, and the nonlinear discontinuous Galerkin method. First, the assembly and the solution phases of the FEM are parallelized and mapped onto the granular […]
Oct, 31
A general tridiagonal solver for coprocessors: Adapting g-Spike for the Intel Xeon Phi
Manycores like the Intel Xeon Phi and graphics processing units like the NVIDIA Tesla series are prime examples of systems for accelerating applications that run on current CPU multicores. It is therefore of interest to build fast, reliable linear system solvers targeting these architectures. Moreover, it is of interest to conduct cross comparisons between algorithmic […]
Oct, 31
Asynchronous Parallel Computing Algorithm implemented in 1D Heat Equation with CUDA
In this note, we present the stability as well as performance analysis of asynchronous parallel computing algorithm implemented in 1D heat equation with CUDA. The primary objective of this note lies in dissemination of asynchronous parallel computing algorithm by providing CUDA code for fast and easy implementation. We show that the simulations carried out on […]
Oct, 29
Padding Free Bank Conflict Resolution for CUDA-Based Matrix Transpose Algorithm
The advances of Graphic Processing Units (GPU) technology and the introduction of CUDA programming model facilitates developing new solutions for sparse and dense linear algebra solvers. Matrix Transpose is an important linear algebra procedure that has deep impact in various computational science and engineering applications. Several factors hinder the expected performance of large matrix transpose […]
Oct, 29
CLOP: A Multi-stage Compiler to Seamlessly Embed Heterogeneous Code
Heterogeneous programming complicates software development. We present CLOP, a platform that embeds code targeting heterogeneous compute devices in a convenient and clean way, allowing unobstructed data flow between the host code and the devices, reducing the amount of source code by an order of magnitude. The CLOP compiler uses the standard facilities of the D […]
Oct, 29
Approximation of BEM matrices using GPGPUs
The efficiency of boundary element methods depends crucially on the time required for setting up the stiffness matrix. The far-field part of the matrix can be approximated by compression schemes like the fast multipole method or $mathcal{H}$-matrix techniques. The near-field part is typically approximated by special quadrature rules like the Sauter-Schwab technique that can handle […]
Oct, 29
GPU Ray-Traced Collision Detection for Cloth Simulation
We propose a method to perform collision detection with cloths with ray-tracing. Our method is able to perform collision detection between cloths and volumetric objects (rigid or deformable) as well as collision detection between cloths (including auto-collision). Our method casts rays between objects to perform collision detection, and an inversion-handling algorithm is introduced to correct […]
Oct, 29
Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Recurrent Neural Network
Bidirectional Long Short-Term Memory Recurrent Neural Network (BLSTM-RNN) has been shown to be very effective for tagging sequential data, e.g. speech utterances or handwritten documents. While word embedding has been demoed as a powerful representation for characterizing the statistical properties of natural language. In this study, we propose to use BLSTM-RNN with word embedding for […]
Oct, 27
CFP: Fourth International Workshop on OpenCL (IWOCL 2016)
* Call for Papers * Now in its fourth year, the International Workshop on OpenCL (IWOCL) will be hosted by TU Wien in Vienna, Austria, at the C3 Convention Center on April 19th – 21st 2016. April 19th is reserved for an Advanced Hands On OpenCL tutorial with April 20th – 21st consisting of a […]
Oct, 27
The 1st International SYCL Workshop (SYCL), 2016
1st SYCL workshop (SYCL’16) – co-located with PPoPP’16 Barcelona, Spain Sunday, 13th March, 2016 http://conf.researchr.org/track/PPoPP-2016/SYCL-2016-papers SYCL (sɪkəl – as in sickle) is a royalty-free, cross-platform C++ abstraction layer that builds on the underlying concepts, portability and efficiency of OpenCL, while adding the ease-of-use and flexibility of C++. For example, SYCL enables single source development where […]
Oct, 27
Evaluation of the Stability and Performance of a Multi-Stage Riemann Solver in Relativistic Hydrodynamic Simulations
The work deals with assessing the quality of a multi-stage Riemann solver for relativistic hydrodynamic simulations of heavy-ion collisions. The physical system is described using hydrodynamic conservation laws and then solved numerically. Because of the nature of such hydrodynamic simulations the numerical method has to cope with problems containing both strong discontinuities and smooth solutions, […]
Oct, 27
Pairwise Sequence Alignment with Gaps with GPU
In this paper we consider the pair-wise sequence alignment problem with gaps, which is motivated by the resequencing problem that requires to assemble short reads sequences into a genome sequence by referring to a reference sequence. The problem has been studied before for single gap and bounded number of gaps. For single gap, there was […]